dpkingma t1_irojnp3 wrote on October 9, 2022 at 8:49 PM

VAE author here.

The parent's (picardythird) comment is a, in my experience, false take on ML. VAEs and many other methods in ML are actually derived from first principles. Which means: start with an idea, do the math/derivations first (which can take considerable time) until you get them right, then implement later.

It's true that a lot of work in ML is different, e.g. hacking up a new architectures, testing against benchmarks, and coming up with a theory later. It may be that parent's experience is more along these lines. But that's not how VAEs (and I think, diffusion models) were conceived, and certainly not 99.99% of the methods.

(I'll try to answer OP in a separate comment.)

picardythird t1_iroxxxd wrote on October 9, 2022 at 10:35 PM

I'll admit that VAEs are one of the (imo very few) methods that are principally motivated. The original VAE paper is actually one of my favorite papers in all of ML; it's well-structured, easy to read, and the steps leading from problem to solution are clear.

This is not at all the case for the vast majority of ML research. How many times have we seen "provably optimal" bounds on something that are then shown to, whoops, not actually be optimal a few years (or months! or weeks!) later? How many times have seen "theoretical proofs" for why something works where it turns out that, whoops, the proofs were wrong (looking at you, Adam)? Perhaps most damningly, how many times have we seen a mathematical model or structure that seems to work well in a lab completely fail when exposed to real world data that does not follow the extraordinarily convenient assumptions and simplifications that serve as the foundational underpinnings? This last point is extremely common in nonstandard fields such as continual learning, adversarial ML, and noisy label learning.

Most certainly, some (perhaps even "much"... I hesitate to say "most" or "all") of the work that comes out of the top top top tier labs is motivated by theory before practice. However, outside of these extraordinary groups it is plainly evident that most methods you see in the wild are empirical, not theoretical.

dpkingma t1_irpaffs wrote on October 10, 2022 at 12:15 AM

I agree with the (milder) claim that most methods in the wild are empirical.

With regards to Adam: I wouldn't say that this method was ad hoc. The method was motivated by the ideas in Sections 2 and 3 of the paper (update rule / initialization bias correction), which are correct. The convergence result in Section 4 should be ignored, but didn't play a role in the conception of the problem, and wasn't very relevant in practice anyway due to the convexity assumption.

scraper01 t1_irod27t wrote on October 9, 2022 at 8:05 PM

Stable diffusion actual implementation doesen't make sense when compared with the paper's description

master3243 t1_irq8yrx wrote on October 10, 2022 at 5:24 AM

Can you provide an example?

Personally, I went through Ho et al. (2020)'s paper (equation by equation) and their code (line by line) back when it first came out and I remember that they both matched each other almost perfectly.

astrange t1_irpk8s2 wrote on October 10, 2022 at 1:33 AM

Do you have specific examples?

It's obviously true that diffusion models don't work for the reasons they were originally thought to; the cold diffusion paper shows this. Also, Stable Diffusion explainers I've seen explain it using pixel diffusion even though it's latent diffusion. And I'm not sure I understand why latent diffusion works.

[deleted] t1_irocx87 wrote on October 9, 2022 at 8:04 PM

[deleted]

aviisu OP t1_iro0trj wrote on October 9, 2022 at 6:46 PM

thank for your insight. Indeed, I genuinely don't understand people who intentionally put a bunch of complex maths in the paper instead of trying to guide with intuition and make the paper more accessible. But then again I heard it will get accepted easier so...

master3243 t1_irq9k6g wrote on October 10, 2022 at 5:31 AM

That's just how math is done in research. If you don't like that then you'll hate, even more, pure math papers where they start with a theorem then show steps to end up with a true statement that is the theorem.

The intuition behind how the author came up with that path of thinking to come up with the final theorem is left (justifiably) entirely in the authors scratch paper or notebooks.

Some authors do give out insight onto the steps they took or their general intuition which is always nice, but not a requirement.

It's also worth mentioning that a lot of us like doing research but don't like writing research papers (as that is just a necessity due to humans lacking telepathic communication) so giving out more info is an optional step in a disliked process which makes sense why it's skipped.

[D] What kind of mental framework/thought process the researchers have when working on solving/proving the math of the new algorithms?

picardythird t1_irnu71n wrote on October 9, 2022 at 6:03 PM