Submitted by aviisu t3_xzpc0w in MachineLearning
I'm currently learning Diffusion Model/Variational Autoencoder. I did encounter some great videos (1, 2, and more. The first one in particular is quite nice.) doing impressive job breaking down the math and (some of) their intuitions. I think I understand 50%-75% of them for now and with some practices I think it will be more clear.
The common theme I encountered (not just Diffusion Model) is something is intractable. Not just one but a series of intractable stuff, one after another. And then the paper's authors attempt to put a bunch of tricks, mathematical gymnastics, assumption, straight-up eliminate some terms because they did not matter (Which TBH, could be justified in its own way). And then for some reason the all the terms just magically cancel each other leaving us two lines of nice, beautiful equations.
My question would be:
-
What kind of thought process reseacher had in mind while choosing what to do with their intractable equation. For me (someone who is not in the math field), it is like looking for a needle in a haystack when there are a lot of possibility of what direction to go forward, and it's not just one step until some terms finally cancel each other in the end.
-
How can they be confident that what they are doing will be fruitful in the end, that they will arrive at satisfiable and practical result. I heard sometimes people just do the implementation first and then do math later to justify the result, but I don't think that's always how it's done.
I would like to hear from people sharing their experience on this. And if anyone has resources to learn about this particular mathematics or mental framework on this, please share.
(I understand that I still have a long way to learn/catch up especially in the math part. My question may make sense to me in the future with enough knowledge and experience.)
picardythird t1_irnu71n wrote
Let's be honest, 99.99% of the time the math is a post-hoc, hand-wavey description of the architecture that happened to work after dozens or hundreds of iterations of architecture search. End up with an autoencoder? Great, you have a latent space. Did you perform normalizations or add noise? Fantastic, you can make (reductive and unjustified) claims about probabilistic interpretations using simple distributions. Know some linear algebra? Awesome, you can take pretty much any vaguely-accurate observation about the archtecture and use it to support whatever interpretation you want.
There is a vanishingly small (but admittedly nonzero) amount of research being done that starts with a theoretical formulation and designs a hypothesis test about that formulation. Much, much, much more common is that researchers will start with a high-level (but ultimately heuristic) systems approach toward architecture selection, and then twiddle and tweak the hyperparameters until the numbers look good enough to publish. Then, maybe they might look back and conjure up some vague bullshit that involves math notation, and claim that as "theoretical justification" knowing that 99% of the time no one is going to read it, much less check their work. Bonus points if they make the notation intentionally confusing, so as to dissuade people from looking too closely. If you can mix in as many fonts, subscripts, superscripts, and set notation as possible, most people's eyes are going to glaze over and they will assume you're right.
It's sad, but it's the reality of the state of our field.
Edit: As for intractable equations, some people are going to answer with talk of variational inference and theoretical guarantees about lower bounds. This is nice and all, but the real answer is that "neural networks go brrr." Neural networks are used as function approximators, and except for some high level architectural descriptions, no one actually knows how or why neural networks learn the functions that they do (or even if they learn the intended functions at all). Anyone that tells you otherwise is selling you something.
Edit 2: Lol, it's always hilarious getting downvotes from people can't stand being called out for their bad practices.