Submitted by xutw21 t3_ya21gp in singularity
FirstOrderCat t1_itc6pne wrote
Reply to comment by Spoffort in U-PaLM 540B by xutw21
It looks like they had point of diminishing return somewhere at 0.5*1e25 FLOPS.
After that model trains much slower. They could continue training farther, and say they "saved" another 20M TPU hours.
Viewing a single comment thread. View all comments