Submitted by Singularian2501 t3_10p3afl in MachineLearning
Paper: https://arxiv.org/abs/2212.10561
Github: https://github.com/ezelikman/parsel
Twitter: https://twitter.com/ericzelikman/status/1618426056163356675?s=20
Website: https://zelikman.me/parselpaper/
Code Generation on APPS Leaderboard: https://paperswithcode.com/sota/code-generation-on-apps
Abstract:
>Despite recent success in large language model (LLM) reasoning, LLMs struggle with hierarchical multi-step reasoning tasks like generating complex programs. For these tasks, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, taking hierarchical function descriptions in natural language as input. We show that Parsel can be used across domains requiring hierarchical reasoning, including program synthesis, robotic planning, and theorem proving. We show that LLMs generating Parsel solve more competition-level problems in the APPS dataset, resulting in pass rates that are over 75% higher than prior results from directly sampling AlphaCode and Codex, while often using a smaller sample budget. We also find that LLM-generated robotic plans using Parsel as an intermediate language are more than twice as likely to be considered accurate than directly generated plans. Lastly, we explore how Parsel addresses LLM limitations and discuss how Parsel may be useful for human programmers.
farmingvillein t1_j6iwb5v wrote
I like the big idea, and it is almost certainly indicative of one of the key tools to improve automated programming.
That said, I wish they had avoided the urge to build an intermediate programming language. This is likely unnecessary and is the type of semi-convoluted solution that you only come up with in an academic research lab (or out of true, deep product need--but I think that is highly unlikely the case).
My guess is that the same basic result in the paper could have been shown by using Python or Rust or similar as the root language, with a little work (time that you could have obtained by swapping out effort spent on the harry potter language development).
They do note:
> We generate 16 Python implementations per high-level plan on 100 randomly sampled problems and find that the performance drops to 6%.
But it isn't well-discussed (unless I skimmed too quickly) as to why a separate language is truly needed. They discussion advantages of Parsel, but there doesn't appear to be a deep ablation on why it is really necessary or where its supposed performance benefits come from, or how those could be enforced in other languages.
There is a bunch of discussion in the appendix, but IMO none of it is very convincing. E.g., Parsel enforces certain conventions around testing and validation...great, lets do that in Python or Rust or similar. Or--leveraging the value of LLMs--through a more natural language interface.
Yes, there is benefit to bridging these gap in a "universal" manner...but, as per https://xkcd.com/927/, a new programming language is rarely the right solution.