Ephemeral_Epoch t1_iqnscns wrote
Reply to comment by ClearlyCylindrical in [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
Seems like you could approximate a minibatch with a full batch + noise? Maybe there's a better noising procedure when using full batch gradients.
SNAPscientist t1_iqr3sej wrote
Capturing the distribution characteristics of high-dimensional data is very hard. In fact if we could do that well, we might be able to use classic bayesian techniques for many NN problems which would be more principled and interpretable. Any noise one would end up adding by hand is unlikely to introduce the kind of stochasticity that sampling on real data (using minibatches or similar procedures) would. Getting the distribution wrong would likely mean poor generalization.
Viewing a single comment thread. View all comments