Viewing a single comment thread. View all comments

dpkingma t1_irojnp3 wrote

VAE author here.

The parent's (picardythird) comment is a, in my experience, false take on ML. VAEs and many other methods in ML are actually derived from first principles. Which means: start with an idea, do the math/derivations first (which can take considerable time) until you get them right, then implement later.

It's true that a lot of work in ML is different, e.g. hacking up a new architectures, testing against benchmarks, and coming up with a theory later. It may be that parent's experience is more along these lines. But that's not how VAEs (and I think, diffusion models) were conceived, and certainly not 99.99% of the methods.

(I'll try to answer OP in a separate comment.)

37

picardythird t1_iroxxxd wrote

I'll admit that VAEs are one of the (imo very few) methods that are principally motivated. The original VAE paper is actually one of my favorite papers in all of ML; it's well-structured, easy to read, and the steps leading from problem to solution are clear.

This is not at all the case for the vast majority of ML research. How many times have we seen "provably optimal" bounds on something that are then shown to, whoops, not actually be optimal a few years (or months! or weeks!) later? How many times have seen "theoretical proofs" for why something works where it turns out that, whoops, the proofs were wrong (looking at you, Adam)? Perhaps most damningly, how many times have we seen a mathematical model or structure that seems to work well in a lab completely fail when exposed to real world data that does not follow the extraordinarily convenient assumptions and simplifications that serve as the foundational underpinnings? This last point is extremely common in nonstandard fields such as continual learning, adversarial ML, and noisy label learning.

Most certainly, some (perhaps even "much"... I hesitate to say "most" or "all") of the work that comes out of the top top top tier labs is motivated by theory before practice. However, outside of these extraordinary groups it is plainly evident that most methods you see in the wild are empirical, not theoretical.

12

dpkingma t1_irpaffs wrote

I agree with the (milder) claim that most methods in the wild are empirical.

With regards to Adam: I wouldn't say that this method was ad hoc. The method was motivated by the ideas in Sections 2 and 3 of the paper (update rule / initialization bias correction), which are correct. The convergence result in Section 4 should be ignored, but didn't play a role in the conception of the problem, and wasn't very relevant in practice anyway due to the convexity assumption.

11