Abstract:

>Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, taking hierarchical function descriptions in natural language as input. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis, robotic planning, and theorem proving. We show that LLMs generating Parsel solve more competition-level problems in the APPS dataset, resulting in pass rates that are over 75% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. We also find that LLM-generated robotic plans using Parsel as an intermediate language are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers.

https://preview.redd.it/tlija53is6fa1.jpg?width=811&format=pjpg&auto=webp&v=enabled&s=a58ec9215ce75dc2437a630dc9597806194da498

https://preview.redd.it/fc2bb93is6fa1.jpg?width=1638&format=pjpg&auto=webp&v=enabled&s=0d11527496bb4f7e9f53df69df397f892828e8ef

https://preview.redd.it/nr4qy83is6fa1.jpg?width=711&format=pjpg&auto=webp&v=enabled&s=e18e5b6c51a68305d195faaf4c92e78914d078a6

https://preview.redd.it/afko1a3is6fa1.jpg?width=1468&format=pjpg&auto=webp&v=enabled&s=5f91482aa9a6a275e03f85c13ea4593d9b958d02

https://preview.redd.it/p2omd73is6fa1.jpg?width=1177&format=pjpg&auto=webp&v=enabled&s=1f3c793b6e548c8c5e0227e94fb18b879bdbbeff

Comments

You must log in or register to comment.

FirstOrderCat t1_j6ieqjp wrote on January 30, 2023 at 3:42 PM

#1,616,141

> Beats prior code generation sota by over 75%!

but on different metric: pass@50 vs pass@8x16

ihateshadylandlords t1_j6iioib wrote on January 30, 2023 at 4:08 PM

#1,616,989

I asked GPTCHAT to make the summary in layman’s terms, because I couldn’t understand the abstract:

Parsel is a tool that helps computer programs called large language models (LLMs) better solve complex tasks. Normally, these LLMs have trouble with tasks that require multiple steps, like creating complicated programs. Parsel helps the LLMs by taking descriptions of the task in everyday language and turning it into code that the LLMs can understand. This makes the LLMs better at solving tasks like creating programs, planning for robots, and proving theories. Tests show that using Parsel leads to better results and more accurate answers compared to other methods. Parsel may also be helpful for human programmers in the future.

94746382926 t1_j6ius97 wrote on January 30, 2023 at 5:25 PM

#1,619,701

I'm not familiar with those benchmarks. Is this still a big deal?

starstruckmon t1_j6j4jxi wrote on January 30, 2023 at 6:25 PM

#1,622,017

Replying to ihateshadylandlords (#1,616,989)

Even if the LLMs themselves don't become perfect at generating Parcel psudocode, having a compiler LM that can reliably convert Parcel ( or something simmilar ) to actual code would be a massive win. Imagine coding in natural language psudocode. A high-er level programming language.

ezelikman t1_j6kib8v wrote on January 30, 2023 at 11:40 PM

#1,633,972

Replying to starstruckmon (#1,622,017)

>having a compiler LM that can reliably convert Parcel ( or something simmilar ) to actual code would be a massive win. Imagine coding in natural language psudocode

We made this available here!: https://github.com/ezelikman/parsel

And there's a notebook here: https://colab.research.google.com/github/ezelikman/parsel/blob/main/parsel.ipynb

Hopefully, there'll be a nicer IDE integration at some point in the nearish future!

ezelikman t1_j6kkch1 wrote on January 30, 2023 at 11:54 PM

#1,634,525

Replying to ihateshadylandlords (#1,616,989)

Here's another slightly longer TL;DR:

Humans solve hard problems by them down into parts and solving them part by part. We normally ask language models to solve algorithmic problems in one go (or if they revise their solutions, we expect them to revise everything). This has been known to be a problem for a while. It turns out, maybe unsurprisingly, that by asking language models to break problems down and then implementing subparts independently, we get way better results.

We do this by writing a programming language (basically, English with indentation plus a small amount of syntax for tests and references). We design an LLM-powered compiler around it to generate programs efficiently. We show it works on solving competitive coding problems, robotic task planning, and math theorem proving. We also show that it's decently robust - able to implement a bare-bones lisp compiler in a few dozen lines.

cosyrelaxedsetting t1_j6lxkpn wrote on January 31, 2023 at 6:46 AM

#1,649,135

Crazy impressive. Many jobs will be on the chopping block in the next 10 years.

[deleted] t1_j6ov8k8 wrote on January 31, 2023 at 9:18 PM

#1,682,527

Replying to cosyrelaxedsetting (#1,649,135)

I doubt it. This will make coders far more productive, but there will still need to be people who know how to translate the real world application into a prompt, and then check the code to ensure it does what you need. I foresee more of a shift to field work.