picardythird t1_irnu71n wrote
Let's be honest, 99.99% of the time the math is a post-hoc, hand-wavey description of the architecture that happened to work after dozens or hundreds of iterations of architecture search. End up with an autoencoder? Great, you have a latent space. Did you perform normalizations or add noise? Fantastic, you can make (reductive and unjustified) claims about probabilistic interpretations using simple distributions. Know some linear algebra? Awesome, you can take pretty much any vaguely-accurate observation about the archtecture and use it to support whatever interpretation you want.
There is a vanishingly small (but admittedly nonzero) amount of research being done that starts with a theoretical formulation and designs a hypothesis test about that formulation. Much, much, much more common is that researchers will start with a high-level (but ultimately heuristic) systems approach toward architecture selection, and then twiddle and tweak the hyperparameters until the numbers look good enough to publish. Then, maybe they might look back and conjure up some vague bullshit that involves math notation, and claim that as "theoretical justification" knowing that 99% of the time no one is going to read it, much less check their work. Bonus points if they make the notation intentionally confusing, so as to dissuade people from looking too closely. If you can mix in as many fonts, subscripts, superscripts, and set notation as possible, most people's eyes are going to glaze over and they will assume you're right.
It's sad, but it's the reality of the state of our field.
Edit: As for intractable equations, some people are going to answer with talk of variational inference and theoretical guarantees about lower bounds. This is nice and all, but the real answer is that "neural networks go brrr." Neural networks are used as function approximators, and except for some high level architectural descriptions, no one actually knows how or why neural networks learn the functions that they do (or even if they learn the intended functions at all). Anyone that tells you otherwise is selling you something.
Edit 2: Lol, it's always hilarious getting downvotes from people can't stand being called out for their bad practices.
dpkingma t1_irojnp3 wrote
VAE author here.
The parent's (picardythird) comment is a, in my experience, false take on ML. VAEs and many other methods in ML are actually derived from first principles. Which means: start with an idea, do the math/derivations first (which can take considerable time) until you get them right, then implement later.
It's true that a lot of work in ML is different, e.g. hacking up a new architectures, testing against benchmarks, and coming up with a theory later. It may be that parent's experience is more along these lines. But that's not how VAEs (and I think, diffusion models) were conceived, and certainly not 99.99% of the methods.
(I'll try to answer OP in a separate comment.)
picardythird t1_iroxxxd wrote
I'll admit that VAEs are one of the (imo very few) methods that are principally motivated. The original VAE paper is actually one of my favorite papers in all of ML; it's well-structured, easy to read, and the steps leading from problem to solution are clear.
This is not at all the case for the vast majority of ML research. How many times have we seen "provably optimal" bounds on something that are then shown to, whoops, not actually be optimal a few years (or months! or weeks!) later? How many times have seen "theoretical proofs" for why something works where it turns out that, whoops, the proofs were wrong (looking at you, Adam)? Perhaps most damningly, how many times have we seen a mathematical model or structure that seems to work well in a lab completely fail when exposed to real world data that does not follow the extraordinarily convenient assumptions and simplifications that serve as the foundational underpinnings? This last point is extremely common in nonstandard fields such as continual learning, adversarial ML, and noisy label learning.
Most certainly, some (perhaps even "much"... I hesitate to say "most" or "all") of the work that comes out of the top top top tier labs is motivated by theory before practice. However, outside of these extraordinary groups it is plainly evident that most methods you see in the wild are empirical, not theoretical.
dpkingma t1_irpaffs wrote
I agree with the (milder) claim that most methods in the wild are empirical.
With regards to Adam: I wouldn't say that this method was ad hoc. The method was motivated by the ideas in Sections 2 and 3 of the paper (update rule / initialization bias correction), which are correct. The convergence result in Section 4 should be ignored, but didn't play a role in the conception of the problem, and wasn't very relevant in practice anyway due to the convexity assumption.
scraper01 t1_irod27t wrote
Stable diffusion actual implementation doesen't make sense when compared with the paper's description
master3243 t1_irq8yrx wrote
astrange t1_irpk8s2 wrote
Do you have specific examples?
It's obviously true that diffusion models don't work for the reasons they were originally thought to; the cold diffusion paper shows this. Also, Stable Diffusion explainers I've seen explain it using pixel diffusion even though it's latent diffusion. And I'm not sure I understand why latent diffusion works.
[deleted] t1_irocx87 wrote
[deleted]
aviisu OP t1_iro0trj wrote
thank for your insight. Indeed, I genuinely don't understand people who intentionally put a bunch of complex maths in the paper instead of trying to guide with intuition and make the paper more accessible. But then again I heard it will get accepted easier so...
master3243 t1_irq9k6g wrote
That's just how math is done in research. If you don't like that then you'll hate, even more, pure math papers where they start with a theorem then show steps to end up with a true statement that is the theorem.
The intuition behind how the author came up with that path of thinking to come up with the final theorem is left (justifiably) entirely in the authors scratch paper or notebooks.
Some authors do give out insight onto the steps they took or their general intuition which is always nice, but not a requirement.
It's also worth mentioning that a lot of us like doing research but don't like writing research papers (as that is just a necessity due to humans lacking telepathic communication) so giving out more info is an optional step in a disliked process which makes sense why it's skipped.
Viewing a single comment thread. View all comments