Submitted by Angry_Grandpa_ t3_y92cl1 in singularity
manOnPavementWaving t1_it4r8la wrote
Reply to comment by Angry_Grandpa_ in A YouTube large language model for a scant $35 million. by Angry_Grandpa_
I have read the paper, which is how I know that they scale data and parameters equally, meaning a 10x in data results in a 100x in compute required and hence a 100x in cost.
Assumptions wise Im looking more at the number of words on youtube, your estimate is likely wildly off.
Youre also ignoring that the training time could very well be long enough that it would be a better strategy to wait for better GPUs to come out.
Angry_Grandpa_ OP t1_it50h5j wrote
What is your estimate?
Viewing a single comment thread. View all comments