[D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point Submitted by CPOOCPOS t3_yql3wl on November 9, 2022 at 2:52 PM in MachineLearning 35 comments 38
name_not_acceptable t1_ivsi5i1 wrote on November 10, 2022 at 7:46 AM Wouldn't you be better off using the parallel computation to calculate gradient at lots of random points and then see if you can find a better local minima? Ie do they all converge to different points. Permalink 1
Viewing a single comment thread. View all comments