Submitted by Vegetable-Skill-9700 t3_121a8p4 in MachineLearning
alrunan t1_jdmbv4k wrote
Reply to comment by harharveryfunny in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
The chinchilla scaling laws is just used to calculate the optimal scale for dataset and model size for a particular training budget.
You should read the LLaMA paper.
harharveryfunny t1_jdmd38s wrote
>You should read the LLaMA paper.
OK - will do. What specifically did you find interesting (related to scaling or not) ?
Viewing a single comment thread. View all comments