camp4climber t1_jccy0qa wrote
Reply to comment by farmingvillein in [D] Is there an expectation that epochs/learning rates should be kept the same between benchmark experiments? by TheWittyScreenName
Yea that's a fair point. These kind of examples certainly exist and often come from large research labs at the very edge of state of the art where the interesting narrative point is scale. The context of specific benchmarks or applications certainly matters.
I still think my point stands in the general case. At least for most of us independent researchers. Ultimately research is about revealing novel insights. Train for longer is not that interesting. But an LLM that fits onto a single GPU, contains 13B parameters, and is capable of outperforming a 175B parameter model is certainly interesting.
Viewing a single comment thread. View all comments