dpkingma

dpkingma t1_irpaffs wrote

I agree with the (milder) claim that most methods in the wild are empirical.

With regards to Adam: I wouldn't say that this method was ad hoc. The method was motivated by the ideas in Sections 2 and 3 of the paper (update rule / initialization bias correction), which are correct. The convergence result in Section 4 should be ignored, but didn't play a role in the conception of the problem, and wasn't very relevant in practice anyway due to the convexity assumption.

11

dpkingma t1_irooyas wrote

Author of VAE/diffusion model papers here.

When a paper introduces something as intractable, it typically means that it can't be computed exactly in a computationally feasible way, which forms a motivation for using approximations. The challenge is then to come up with an approximation that has nice properties (e.g computationally cheap, unbiased, is a bound, etc.).

As another commenter also wrote, how an idea is presented in the paper is typically different from how the author(s) came up with the idea. At the start of a research project you often start with a concrete problem and some vague intuitions on how to solve it. Then through an iterative process you refine your ideas. Often it turns out that what you wanted to do is impossible. Often it turns out that a solution already exists so there's no need to publish. The process often requires lots of backtracking, scrapping dead ends, and it often requires time before an idea finally 'clicks' in your mind (which is a great feeling). Especially when you start out in research, nine out of ten ideas turn out to be either already solved, or (seemingly) unsolvable, which can be very frustrating. And Since negative results are typically unpublishable or uninteresting, you don't read about them. The flip side is that with more experience, you build a bigger mental toolbox, making it easier spot dead ends and see opportunities. The best way to get is there is to read ML books (or other media) that teach you the underlying math, and lots of practice.

154

dpkingma t1_irojnp3 wrote

VAE author here.

The parent's (picardythird) comment is a, in my experience, false take on ML. VAEs and many other methods in ML are actually derived from first principles. Which means: start with an idea, do the math/derivations first (which can take considerable time) until you get them right, then implement later.

It's true that a lot of work in ML is different, e.g. hacking up a new architectures, testing against benchmarks, and coming up with a theory later. It may be that parent's experience is more along these lines. But that's not how VAEs (and I think, diffusion models) were conceived, and certainly not 99.99% of the methods.

(I'll try to answer OP in a separate comment.)

37