Ok_Swordfish5638 t1_irobilk wrote on October 9, 2022 at 7:56 PM

Reply to [D] What kind of mental framework/thought process the researchers have when working on solving/proving the math of the new algorithms? by aviisu

First off, keep in mind that this is difficult work, and the mathematical story you’re reading is one that has been cleaned up and streamlined for presentation. The blind alleys, dead ends, and confusion have been filtered out for you, giving the illusion that the author had it all figured out from the start. That’s not true, and often the complicated steps that seem like magic are a result of a researcher chewing on a particular step for a quite a while before finding the right way to proceed. The process of figuring it out often involves trying different ways of looking at a problem until you hit upon one that works.

A lot of prior experience plays into it as well. After having solved many similar problems and working through other derivations you start to get a feel for what might work in certain situations. At the same time, as you do more of this kind of work your “bag of tricks” expands, so you learn more tools that you can bring to bear on new problems. There’s not really a substitute for this other than experience and practice, similar to how an expert programmer has practiced their skills for years to reach that point.

Often times, you have a strong intuition about what the final result will look like qualitatively. This helps determine whether or not it’s worth grinding through the math to make sure you get all the details right, and guides you in how to approach each step. It’s usually not the case that you start a derivation without having some idea where it will lead you, though sometimes the initial intuition doesn’t make it in to the final presentation. Having a good reason to believe that the thing you’re trying to derive or prove is useful is very important before starting out.

The more clearly you can state what you’re trying to do at the outset, the better off you will be. For a proof, this takes the form of very clearly stating what you’re trying to prove, while for something like the topics you’re studying currently this might take the form of having a clear idea what parts of the cost function you’re trying to simplify or improve. There’s no guarantee you’ll be successful, but you also don’t read about the unsuccessful attempts.