Viewing a single comment thread. View all comments

FirstOrderCat t1_itc6pne wrote on October 22, 2022 at 3:00 PM

It looks like they had point of diminishing return somewhere at 0.5*1e25 FLOPS.

After that model trains much slower. They could continue training farther, and say they "saved" another 20M TPU hours.