Submitted by __Maximum__ t3_11l3as6 in MachineLearning
__Maximum__ OP t1_jbdr6zj wrote
Reply to comment by Taenk in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Not quite. Assuming you have certain compute, if you have a model with 1B parameters, then use a dataset of 20B tokens. Look at the figures in Chinchilla paper, they demonstrate it nicely.
Viewing a single comment thread. View all comments