alrunan t1_jdmbv4k wrote on March 25, 2023 at 1:42 PM

The chinchilla scaling laws is just used to calculate the optimal scale for dataset and model size for a particular training budget.

You should read the LLaMA paper.

harharveryfunny t1_jdmd38s wrote on March 25, 2023 at 1:52 PM

>You should read the LLaMA paper.

OK - will do. What specifically did you find interesting (related to scaling or not) ?

The 7B model is trained on 1T tokens and performs really well for its number of parameters.

[deleted]