[Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? Submitted by 029187 t3_xt0h2k on October 1, 2022 at 5:01 PM in MachineLearning 20 comments 3
Cheap_Meeting t1_iqq8oku wrote on October 2, 2022 at 8:58 AM Adding to other answers: Even if you had enough memory, if it would still be computationally inefficient. There is a diminishing return from increasing batch size in terms of how much the loss improves each step. Permalink 1
Viewing a single comment thread. View all comments