Really interesting. I wish they included more details on the text-to-code models. They share the table showing that increasing the size increases the accuracy, but they apparently never train it higher than 1.5B parameters? It would be interesting to know how much of the remaining error on their benchmark is due to error in the text-to-code generation vs the foundational models. Or even just a standalone accuracy measure.
Regardless, super cool to see the number 100 popping up in benchmarks like this.
thunderdome t1_is9cigc wrote
Reply to [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
Really interesting. I wish they included more details on the text-to-code models. They share the table showing that increasing the size increases the accuracy, but they apparently never train it higher than 1.5B parameters? It would be interesting to know how much of the remaining error on their benchmark is due to error in the text-to-code generation vs the foundational models. Or even just a standalone accuracy measure.
Regardless, super cool to see the number 100 popping up in benchmarks like this.