Creepy-Tackle-944 t1_iqow79w wrote
Hard to answer.
A few years ago my answer would be a resounding "hell no", back in those days a batch size of 64 is considered large.
Today training configurations of top-performing models are commonly in the ballpark 4096 images per batch, which I never thought I would see.
This kind of shows that batch size does not really exist in a vacuum but rather coexists with other parameters. For efficiency doing everything in one batch would be desirable since everything is in RAM. However, actually doing so would require coming up with some entirely new set of parameters.
Also, gradient accumulation is a thing, and you can theoretically have a train epoch as a single batch without running OOM, but nobody found that to be effective yet.
Viewing a single comment thread. View all comments