nogop1

nogop1 t1_j27xexh wrote

I wonder whether the large models are not better due to their larger amount of params, but the increased number of layers. Thus being able to perform more steps and search more deeply.

If been wondering if certain questions/algos do not need a variable amount of steps. Leaving aside the universal function approximation theorem, would simple exponentiation not require that? If I were to ask a llm/transformer to perform these arithmetic operations?

1