Submitted by Tea_Pearce t3_10aq9id in MachineLearning
ml-research t1_j45nvno wrote
Yes, I guess feeding more data to larger models will be better in general.
But what should we (especially who do not have access to large computing resources) do while waiting for computation to be cheaper? Maybe balancing the amount of inductive bias and the improvement in performance to bring the predicted improvements a bit earlier?
mugbrushteeth t1_j45xihj wrote
One dark outlook on this is the compute cost reduces very slowly (or does not reduce at all), the large models become the ones that only the rich can run. And using the capital that they earn using the large models, they reinvest and further accelerate the model development to even larger models and the models become inaccessible to most people.
dimsycamore t1_j46jj4p wrote
Already happening unfortunately
anonsuperanon t1_j47g6e3 wrote
Literally just the history of all technology, which suggests saturation given enough time.
currentscurrents t1_j4702g0 wrote
Compute is going to get cheaper over time though. My phone today has the FLOPs of a supercomputer from 1999.
Also if LLMs become the next big thing you can expect GPU manufacturers to include more VRAM and more hardware acceleration directed at them.
RandomCandor t1_j47bx4j wrote
To me, all that means is that the lay people will always be a generation behind from what the rich can afford to run
currentscurrents t1_j48csbo wrote
If it is true that performance scales infinitely with compute power - and I kinda hope it is, since that would make superhuman AI achievable - datacenters will always be smarter than PCs.
That said, I'm not sure that it does scale infinitely. You need not just more compute but also more data, and there's only so much data out there. GPT-4 reportedly won't be any bigger than GPT-3 because even terabytes of scraped internet data isn't enough to train a larger model.
BarockMoebelSecond t1_j48mepq wrote
Which is and has been the Status Quo for the entire history of computing, I don't see how that's a new development?
currentscurrents t1_j490rvn wrote
It's meaningful right now because there's a threshold where LLMs become awesome, but getting there requires expensive specialized GPUs.
I'm hoping in a few years consumer GPUs will have 80GB of VRAM or whatever and we'll be able to run them locally. While datacenters will still have more compute, it won't matter as much since there's a limit where larger models would require more training data than exists.
Playful_Ad_7555 t1_j49k8p2 wrote
silicon computing is already very close to its limit based on foreseeable technology. the exponential explosion in computing power and available data from 2000-2020 isnt going to be replicated
Opposite-Platypus-99 t1_j4ahpg6 wrote
now, can you confirm you can run arbitrary software on your phone?
bloc97 t1_j49ft0g wrote
My bet is on "mortal computers" (term coined by Hinton). Our current methods to train Deep Nets are extremely inefficient. CPU and GPUs basically have to load data, process it, then save it back to memory. We can eliminate this bandwidth limitation by printing basically a very large differentiable memory cell, with hardware connections inside representing the connections between neurons, which will allow us to do inference or backprop in a single step.
gdiamos t1_j4a96pu wrote
Currently we have exascale computers, e.g. 1e18 flops at around 50e6 watts.
The power output of the sun is about 4e26 watts. That's 20 orders of magnitude on the table.
This paper claims that energy of computation can theoretically be reduced by another 22 orders of magnitude. https://arxiv.org/pdf/quant-ph/9908043.pdf
So physics (our current understanding) seems to allow at least 42 orders of magnitude bigger (computationally) learning machines than current generation foundation models, without leaving this solar system, and without converting mass into energy...
visarga t1_j46af21 wrote
Exfiltrate the large language models - get them to (pre)label your data. Then use this data to fine-tune a small and efficient HF model. You only pay for the training data.
currentscurrents t1_j4716tp wrote
Try to figure out systems that can generalize from smaller amounts of data? It's the big problem we all need to solve anyway.
There's a bunch of promising ideas that need more research:
- Neurosymbolic computing
- Expert systems built out of neural networks
- Memory augmented neural networks
- Differentiable neural computers
boss_007 t1_j48qyxu wrote
You don't have a dedicated tpu cluster in your lab? Pffftt
Viewing a single comment thread. View all comments