gdahl
gdahl t1_j6uc1bh wrote
Reply to comment by Screye in [D] What does a DL role look like in ten years? by PassingTumbleweed
Deep learning existed as a field in 2012. The speech recognition community had already adopted deep learning by that point. The Brain team at Google already existed. Microsoft, IBM, and Google were all using deep learning. As an academic subfield, researchers started to coalesce around "deep learning" as a brand in 2006, but it certainly was very niche at that point.
gdahl t1_j6ubet7 wrote
Deep learning roles 10 years ago (in 2013) were pretty similar to what they look like now, except they are much more numerous now. I'm sure there will be some changes and a proliferation of more entry-level roles and "neural network technician" roles, but it isn't going to be that different.
gdahl t1_iyy504n wrote
Reply to comment by Conscious_Heron_9133 in [D] PyTorch 2.0 Announcement by joshadel
Have you tried Dex? https://github.com/google-research/dex-lang It is in a relatively early stage, but it is exploring some interesting parts of the design space.
gdahl t1_iqpg125 wrote
Reply to [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez
People use lots of other things too. Probit regression, Poisson likelihoods, all sorts of stuff. As you said, it is best to fit what you are doing to the problem.
Logistic-regression style output layers are very popular in deep learning, perhaps even more than in other parts of the ML community. But Gaussian Process Classification is often done with probit models (see http://gaussianprocess.org/gpml/chapters/RW3.pdf ). However, if necessary people will design neural network output activation functions and losses to fit the problem they are solving.
That said, a lot of people doing deep learning joined the field in the past 2 years and just use what they see other people using, without giving it much thought. So we get these extremely popular cross entropy losses.
gdahl t1_iqpf8j8 wrote
Reply to [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
Adam is more likely to outperform steepest descent (full batch GD) in the full batch setting than it is to outperform SGD at batch size 1.
gdahl t1_j6upct4 wrote
Reply to comment by [deleted] in [D] What does a DL role look like in ten years? by PassingTumbleweed
I would say the turning point was when we published the first successful large vocabulary results with deep acoustic models in April 2011, based on work conducted over the summer of 2010. When we published the paper you mention, it was to recognize that these techniques were the new standard in top speech recognition groups.
Regardless, there were deep learning roles in tech companies in 2012, just not very many of them compared to today.