navillusr t1_j47wc3c wrote on January 13, 2023 at 7:54 PM

Reply to comment by throwaway2676 in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents

It’s definitely a hard problem. The challenge isn’t a pipeline problem of “solve this reasoning task” where you can just take the english task -> convert to code -> run code-> convert to english answer. We could probably do that with some degree of accuracy in some contexts.

The hard part is having the agent solve reasoning tasks without prompt engineering, when they appear, without telling it that it’s a reasoning task. In essence it should be able to combine reasoning and planning seamlessly with the generative side of intelligence, not just piece them together when you tell it to outsource the task to a reasoning engine (assuming it could even do this accurately)

For example, if you ask ChatGPT to play rock paper scissors, but choose the option that beats the option that beats the option that you pick. (i.e if I pick Rock, you pick Scissors, because scissors beats paper which beats rock), it cant plan that far ahead.

> Let’s play a modified version of Rock Paper Scissors, but to win, you have to pick the option that beats the option that beats the option that I pick.

> Sure, I'd be happy to play a modified version of Rock Paper Scissors with you. Please go ahead and make your selection, and I'll pick the option that beats the option that beats it.

> Rock

> In that case, I will pick paper.

Since this game requires 2 steps of thinking, and goes against the statistically likely answer in this scenario it fails. As you described, you could maybe write code that identifies a rock paper scissor game, generates and runs code, then answers in english, but there are many real world tasks that require more than 1 step planning that the agent needs to be able to seamlessly identify and work through. (For the record, it also outputs incorrect python code for this game when prompted)

I don’t do research in this specific area so again I could be off base here, but I think that’s why its harder than you’re imagining.

Fwiw, there was a recent paper (the method was called the Mind’s Eye) where they used an LLM to generate physics simulator code to answer physics question similar to what you described.