HateRedditCantQuitit
HateRedditCantQuitit t1_iyj6yb6 wrote
Reply to comment by CyberPun-K in [R] Statistical vs Deep Learning forecasting methods by fedegarzar
14 days is 20k minutes, so it’s about 6.7 minutes per time series. I don’t know how many models are in the ensemble, but let’s assume it’s 13 models for even math, making an average deep model take 30s to train on an average time series.
Is that so crazy?
HateRedditCantQuitit t1_ix4d4sx wrote
Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
You can scale semi-supervised learning much more easily and cheaply and safely than you can scale human-in-the-loop RL. Similar to why we don’t train self driving cars by putting them in the real world and making them learn by RL.
If we could put a language model in a body and let it learn safely through human tutoring in a more time effective and cost effective way, maybe it could be worthwhile. Today, it doesn’t seem to be the time effective or cost effective solution.
And while I’m on my podium, once LMs are in any loop commercially talking to people at scale, I expect this will be a huge topic.
Tangentially, check out this short story/novella that kinda explores the idea from a fictional perspective. It’s incredibly well written and interesting by a favorite author of mine. “The Lifecycle of Software Objects” by Ted Chiang https://web.archive.org/web/20130306030242/http://subterraneanpress.com/magazine/fall_2010/fiction_the_lifecycle_of_software_objects_by_ted_chiang
HateRedditCantQuitit t1_iuif6gz wrote
Take a look around the connections between diffusion and score models. At that point, diffusion is annealed MCMC. The score model side of the lit seems to be deep in the MCMC side of things. A starting point would be Yang Song’s blog post on score models. https://yang-song.net/blog/2021/score/
HateRedditCantQuitit t1_ir17eu2 wrote
It’s impossible to say what the future of such a fast moving field looks like without specifying a timescale. Will low-code ML replace coded ML stuff in the next year or two? Probably not. What will the main industrial applications of ML be in 5-6 years? Shit moves so fast it’s hard to say.
HateRedditCantQuitit t1_iqx446n wrote
F(x) = s(WX+b) isn’t all of deep learning.
You may have heard of transformers, which are closer to s(X W X^T) (but actually more involved than that). They’re an extremely popular model right now.
HateRedditCantQuitit t1_iqnfxvb wrote
Reply to [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez
Log linear probabilities are really convenient.
HateRedditCantQuitit t1_iztdm8c wrote
Reply to [D] Does Google TPU v4 compete with GPUs in price/performance? by Shardsmp
> but I'm still wondering whether its worth it to just build a graphics card rig for the long term.
Pretty much never, assuming it's for personal use.
If you're going to use this rig exclusively for ML, then maybe it still makes sense. The calculation becomes simple:
cost to buy + energy cost to use * amount of use before it doesnt fit your needs
vscloud cost
. If you use it enough for this to make sense, you might also be surprised how quickly you outgrow it (e.g. maybe you'll want to run some experiments in parallel sometimes, or you want to use models bigger than this thing's VRAM in a year or few).If you want to use it for non-ML use, then no just use the cloud. If you're using it enough that the above calculation says to buy, then you won't actually get to use it for non-ML use, which will just annoy the hell out of you.