I wonder whether the large models are not better due to their larger amount of params, but the increased number of layers. Thus being able to perform more steps and search more deeply.
If been wondering if certain questions/algos do not need a variable amount of steps. Leaving aside the universal function approximation theorem, would simple exponentiation not require that? If I were to ask a llm/transformer to perform these arithmetic operations?
nogop1 t1_j27xexh wrote
Reply to [R] LAMBADA: Backward Chaining for Automated Reasoning in Natural Language - Google Research 2022 - Significantly outperforms Chain of Thought and Select Inference in terms of prediction accuracy and proof accuracy. by Singularian2501
I wonder whether the large models are not better due to their larger amount of params, but the increased number of layers. Thus being able to perform more steps and search more deeply.
If been wondering if certain questions/algos do not need a variable amount of steps. Leaving aside the universal function approximation theorem, would simple exponentiation not require that? If I were to ask a llm/transformer to perform these arithmetic operations?