Submitted by aviisu t3_xzpc0w in MachineLearning
I'm currently learning Diffusion Model/Variational Autoencoder. I did encounter some great videos (1, 2, and more. The first one in particular is quite nice.) doing impressive job breaking down the math and (some of) their intuitions. I think I understand 50%-75% of them for now and with some practices I think it will be more clear.
The common theme I encountered (not just Diffusion Model) is something is intractable. Not just one but a series of intractable stuff, one after another. And then the paper's authors attempt to put a bunch of tricks, mathematical gymnastics, assumption, straight-up eliminate some terms because they did not matter (Which TBH, could be justified in its own way). And then for some reason the all the terms just magically cancel each other leaving us two lines of nice, beautiful equations.
My question would be:
-
What kind of thought process reseacher had in mind while choosing what to do with their intractable equation. For me (someone who is not in the math field), it is like looking for a needle in a haystack when there are a lot of possibility of what direction to go forward, and it's not just one step until some terms finally cancel each other in the end.
-
How can they be confident that what they are doing will be fruitful in the end, that they will arrive at satisfiable and practical result. I heard sometimes people just do the implementation first and then do math later to justify the result, but I don't think that's always how it's done.
I would like to hear from people sharing their experience on this. And if anyone has resources to learn about this particular mathematics or mental framework on this, please share.
(I understand that I still have a long way to learn/catch up especially in the math part. My question may make sense to me in the future with enough knowledge and experience.)
dpkingma t1_irooyas wrote
Author of VAE/diffusion model papers here.
When a paper introduces something as intractable, it typically means that it can't be computed exactly in a computationally feasible way, which forms a motivation for using approximations. The challenge is then to come up with an approximation that has nice properties (e.g computationally cheap, unbiased, is a bound, etc.).
As another commenter also wrote, how an idea is presented in the paper is typically different from how the author(s) came up with the idea. At the start of a research project you often start with a concrete problem and some vague intuitions on how to solve it. Then through an iterative process you refine your ideas. Often it turns out that what you wanted to do is impossible. Often it turns out that a solution already exists so there's no need to publish. The process often requires lots of backtracking, scrapping dead ends, and it often requires time before an idea finally 'clicks' in your mind (which is a great feeling). Especially when you start out in research, nine out of ten ideas turn out to be either already solved, or (seemingly) unsolvable, which can be very frustrating. And Since negative results are typically unpublishable or uninteresting, you don't read about them. The flip side is that with more experience, you build a bigger mental toolbox, making it easier spot dead ends and see opportunities. The best way to get is there is to read ML books (or other media) that teach you the underlying math, and lots of practice.