Submitted by Singularian2501 t3_10p3ac7 in singularity
Paper: https://arxiv.org/abs/2212.10561
Github: https://github.com/ezelikman/parsel
Twitter: https://twitter.com/ericzelikman/status/1618426056163356675?s=20
Website: https://zelikman.me/parselpaper/
Code Generation on APPS Leaderboard: https://paperswithcode.com/sota/code-generation-on-apps
Abstract:
>Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, taking hierarchical function descriptions in natural language as input. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis, robotic planning, and theorem proving. We show that LLMs generating Parsel solve more competition-level problems in the APPS dataset, resulting in pass rates that are over 75% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. We also find that LLM-generated robotic plans using Parsel as an intermediate language are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers.
ihateshadylandlords t1_j6iioib wrote
I asked GPTCHAT to make the summary in layman’s terms, because I couldn’t understand the abstract:
Parsel is a tool that helps computer programs called large language models (LLMs) better solve complex tasks. Normally, these LLMs have trouble with tasks that require multiple steps, like creating complicated programs. Parsel helps the LLMs by taking descriptions of the task in everyday language and turning it into code that the LLMs can understand. This makes the LLMs better at solving tasks like creating programs, planning for robots, and proving theories. Tests show that using Parsel leads to better results and more accurate answers compared to other methods. Parsel may also be helpful for human programmers in the future.