nogop1 t1_j27xexh wrote
I wonder whether the large models are not better due to their larger amount of params, but the increased number of layers. Thus being able to perform more steps and search more deeply.
If been wondering if certain questions/algos do not need a variable amount of steps. Leaving aside the universal function approximation theorem, would simple exponentiation not require that? If I were to ask a llm/transformer to perform these arithmetic operations?
currentscurrents t1_j2csenb wrote
The number of layers is a hyperparameter, and people do optimization to determine the optimal values for hyperparameters.
Model size does seem to be a real scaling law. It's possible that we will come up with better algorithms that work on smaller models, but it's also possible that neural networks need to be big to be useful. With billions of neurons and an even larger number of connections/parameters, the human brain is certainly a very large network.
Viewing a single comment thread. View all comments