BitShin t1_ivsjvcr wrote on November 10, 2022 at 8:09 AM

Reply to [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

In addition to providing a smoother descend, computing the average gradient over many samples is very parallelizable. So on a modern GPU, taking the gradient at a single point vs 50 points is not 50 times more expensive. In fact, if your model is small enough, they could take roughly the same amount of time.