In addition to providing a smoother descend, computing the average gradient over many samples is very parallelizable. So on a modern GPU, taking the gradient at a single point vs 50 points is not 50 times more expensive. In fact, if your model is small enough, they could take roughly the same amount of time.
BitShin t1_ivsjvcr wrote
Reply to [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
In addition to providing a smoother descend, computing the average gradient over many samples is very parallelizable. So on a modern GPU, taking the gradient at a single point vs 50 points is not 50 times more expensive. In fact, if your model is small enough, they could take roughly the same amount of time.