CPOOCPOS OP t1_ivp0tap wrote
Reply to comment by jnez71 in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
thanks for your reply jnez! Yes, i have also had the thought actually of using the average of many local points to estimate the local curvature like needed in the BFGS.
You are right by saying that in a classical sense there are far better things to do with many adjacent gradient computations. But here I am doing machine learning on a quantum computer, and the interesting part is, that it is very cheap to calculate the average (and only the average) of many points. To be more concrete about the computational cost, it only takes linear effort to compute the average of an exponential amount of points.
As a start, when i was developing the idea, i just thought of the procedure as just being a vote on a bunch of local points on which direction they would like to go. But now I am looking for more concrete theoretical arguments on why it would make sense to take the average gradient (since on a quantum computer i wouldn't have this computational overhead like on a classical computer)
jnez71 t1_ivp3sju wrote
Hm, there may be a way to exploit that cheapened average gradient computation to still tell you curvature, which can help a lot.
I am reminded of how a covariance matrix is really just composed of means: cov[g,g] = E[gg'] - E[g]E[g']
(where '
is transpose). If g
is distributed as the gradients in your volume, I suspect that cov[g,g]
is related to the Hessian, and you can get that covariance with basically just averages of g
.
More intuitively I'm thinking, "in this volume, how much on average does the gradient differ from the average gradient." If your quantum computer really makes that volume averaging trivial, then I suspect someone would have come up with this as some kind of "quantum Newton's method."
I think that's all I got for ya. Good luck!
CPOOCPOS OP t1_ivp4z7j wrote
Thanks!! Wish you a good day!!
jnez71 t1_ivp6ril wrote
Oh I should add that from a nonconvex optimization perspective, the volume-averaging could provide heuristic benefits akin to GD+momentum type optimizers. (Edited my first comment to reflect this).
Try playing around with your idea in low dimensions on a classical computer to get a feel for it first. Might help you think of new ways to research it.
Viewing a single comment thread. View all comments