dasayan05 t1_iqnltrg wrote
Mini-batches are not here just for memory limitations. They inject noise in the optimization which helps escape local minimas and explores the loss landscape.
029187 OP t1_iqp6s79 wrote
what if, as another poster said, we did full batch but also injected noise into it?
dasayan05 t1_iqp8hnf wrote
possible. but what is the advantage with that ? even if we did find a way to explicitly noise the data/gradient, we are still better off with mini-batches as they offer less memory consumption
029187 OP t1_iqrinm2 wrote
If its only as good, then it has no benefit. But if it ends up being better, then it is useful for situations where we have enough memory.
​
https://arxiv.org/abs/2103.17182
​
This paper here is claiming they might have found interesting ways to potentially make it better.
Red-Portal t1_iqpczjb wrote
People have tried it, and so far no one has been able to achieve the same effect. It's still somewhat of an open research problem.
029187 OP t1_iqpigzv wrote
ah cool! do you have any links to papers on the topic? i'd love to read them!
Red-Portal t1_iqpipq6 wrote
I think it was this one: https://arxiv.org/abs/2103.17182
029187 OP t1_iqrihvc wrote
thanks!!
Viewing a single comment thread. View all comments