Viewing a single comment thread. View all comments

dpkingma t1_irpaffs wrote

I agree with the (milder) claim that most methods in the wild are empirical.

With regards to Adam: I wouldn't say that this method was ad hoc. The method was motivated by the ideas in Sections 2 and 3 of the paper (update rule / initialization bias correction), which are correct. The convergence result in Section 4 should be ignored, but didn't play a role in the conception of the problem, and wasn't very relevant in practice anyway due to the convexity assumption.

11