Submitted by CPOOCPOS t3_yql3wl in MachineLearning
Hello, I'm wondering if during learning tasks there is an advantage in taking the average Gradient of a small Volume in our parameter space compared to just taking the Gradient of the one center point in that parameter-space volume.
Edit:
I noticed that i have some people confused, and rightly so, because i didn't mention some aspects. The computational overhead of the average of such many points not important, why is not important but it has to do with the fact that this can be extremely cheap on a quantum computer. My question more precisely would be, given the face that computational overhead can be ignored, are there in general theoretical advantages in taking the average over a volume in comparison to just at a point.
I am a physicist and have only some experience in ML. My thesis is at the moment in Quantum Machine Learning and this question ahs been central to my research for some weeks. But unfortunately i can't find anything online that regards this question. I was wondering if someone here, would have some insights to this question.
bluuerp t1_ivov64y wrote
The gradient gives you the optimal improvement direction....if you have 10 positions the gradient of all 10 will point in different directions...so if you take a step after each point you'll zig zag arround a lot. You might even backtrack a bit. If you however take the average of all 10 and do a step you won't be optimal in regards to all points individually, but the path you'll take will be smoother.
So it depends on your dataset. Usually you want to have some smoothing because otherwise you won't converge that easily.
​
The same is true for your example....the center point might not be a good estimate of the surrounding. It could however be that it is close to the average and there isn't that big of a difference.