mugbrushteeth t1_j45xihj wrote on January 13, 2023 at 11:44 AM

One dark outlook on this is the compute cost reduces very slowly (or does not reduce at all), the large models become the ones that only the rich can run. And using the capital that they earn using the large models, they reinvest and further accelerate the model development to even larger models and the models become inaccessible to most people.

dimsycamore t1_j46jj4p wrote on January 13, 2023 at 2:51 PM

Already happening unfortunately

anonsuperanon t1_j47g6e3 wrote on January 13, 2023 at 6:15 PM

Literally just the history of all technology, which suggests saturation given enough time.

currentscurrents t1_j4702g0 wrote on January 13, 2023 at 4:37 PM

Compute is going to get cheaper over time though. My phone today has the FLOPs of a supercomputer from 1999.

Also if LLMs become the next big thing you can expect GPU manufacturers to include more VRAM and more hardware acceleration directed at them.

RandomCandor t1_j47bx4j wrote on January 13, 2023 at 5:49 PM

To me, all that means is that the lay people will always be a generation behind from what the rich can afford to run

currentscurrents t1_j48csbo wrote on January 13, 2023 at 9:37 PM

If it is true that performance scales infinitely with compute power - and I kinda hope it is, since that would make superhuman AI achievable - datacenters will always be smarter than PCs.

That said, I'm not sure that it does scale infinitely. You need not just more compute but also more data, and there's only so much data out there. GPT-4 reportedly won't be any bigger than GPT-3 because even terabytes of scraped internet data isn't enough to train a larger model.

BarockMoebelSecond t1_j48mepq wrote on January 13, 2023 at 10:39 PM

Which is and has been the Status Quo for the entire history of computing, I don't see how that's a new development?

currentscurrents t1_j490rvn wrote on January 14, 2023 at 12:18 AM

It's meaningful right now because there's a threshold where LLMs become awesome, but getting there requires expensive specialized GPUs.

I'm hoping in a few years consumer GPUs will have 80GB of VRAM or whatever and we'll be able to run them locally. While datacenters will still have more compute, it won't matter as much since there's a limit where larger models would require more training data than exists.

Playful_Ad_7555 t1_j49k8p2 wrote on January 14, 2023 at 2:47 AM

silicon computing is already very close to its limit based on foreseeable technology. the exponential explosion in computing power and available data from 2000-2020 isnt going to be replicated

Opposite-Platypus-99 t1_j4ahpg6 wrote on January 14, 2023 at 8:32 AM

now, can you confirm you can run arbitrary software on your phone?

bloc97 t1_j49ft0g wrote on January 14, 2023 at 2:12 AM

My bet is on "mortal computers" (term coined by Hinton). Our current methods to train Deep Nets are extremely inefficient. CPU and GPUs basically have to load data, process it, then save it back to memory. We can eliminate this bandwidth limitation by printing basically a very large differentiable memory cell, with hardware connections inside representing the connections between neurons, which will allow us to do inference or backprop in a single step.

gdiamos t1_j4a96pu wrote on January 14, 2023 at 6:42 AM

Currently we have exascale computers, e.g. 1e18 flops at around 50e6 watts.

The power output of the sun is about 4e26 watts. That's 20 orders of magnitude on the table.

This paper claims that energy of computation can theoretically be reduced by another 22 orders of magnitude. https://arxiv.org/pdf/quant-ph/9908043.pdf

So physics (our current understanding) seems to allow at least 42 orders of magnitude bigger (computationally) learning machines than current generation foundation models, without leaving this solar system, and without converting mass into energy...

visarga t1_j46af21 wrote on January 13, 2023 at 1:45 PM

Exfiltrate the large language models - get them to (pre)label your data. Then use this data to fine-tune a small and efficient HF model. You only pay for the training data.

currentscurrents t1_j4716tp wrote on January 13, 2023 at 4:44 PM

Try to figure out systems that can generalize from smaller amounts of data? It's the big problem we all need to solve anyway.

There's a bunch of promising ideas that need more research:

Neurosymbolic computing
Expert systems built out of neural networks
Memory augmented neural networks
Differentiable neural computers

boss_007 t1_j48qyxu wrote on January 13, 2023 at 11:09 PM

You don't have a dedicated tpu cluster in your lab? Pffftt

[D] Bitter lesson 2.0?

ml-research t1_j45nvno wrote on January 13, 2023 at 9:39 AM