The_humble_tortoise
The_humble_tortoise t1_ivpn0u5 wrote
Reply to [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
My take from your question is basically the difference between SGD and mini-batch SGD. Batch SGD in general provides a more robust (less overfitting) and smoother ( less zig-zag) solution; but it really differs between different data. More homogenous data usually work better with SGD; otherwise mini-batch SGD will perform better.
The_humble_tortoise t1_j1teb45 wrote
Reply to [D] ANN for sine wave prediction by T4KKKK
https://www.resceu.s.u-tokyo.ac.jp/workshops/resceu20s/docs/hartwig.pdf
I guess you could try this. I have always wanted to try this to see if this works, but I admittedly am lazy.