MrTacobeans t1_jcbj1kw wrote
This is coming at a total layman's point of view that follows ai Schmutz pretty closely but anyway...
Wouldn't running a tighter learning variable and longer epoch length reduce many of the benefits of a NN outside of a synthetic benchmark?
From what I know a NN can be loosely trained and helpfully "hallucinate" the gaps it doesn't know and still be useful. When the network is constricted it might be extremely accurate and smaller than the loose model but the intrinsically useful/good hallucinations will be lost and things outside the benchmark will hallucinate worse than the loose model.
I give props to AI engineers this all seems like an incredibly delicate balance and probably why massive amounts of data is needed to prevent either side of this situation.
I feel like there is no need to enforce a epoch/learning curve in benchmarks because usually models converge to their best versions at different points regardless of the data used and if they are making a paper they likely tweaked something that was worth training and writing about beyond beating a benchmark
AccountGotLocked69 t1_jcesw8m wrote
I assume by hallucinate gaps you mean interpolate? In general it's the opposite, smaller simpler models are better at generalizing. Of course there are a million exceptions to this rule, but in the simple picture of using stable combinations of batch sizes and learning rates, big models will be more prone to overfit the data. Most of this rests on the assumption that the "ground truth" is always a simpler function than memorizing the entire dataset.
Viewing a single comment thread. View all comments