ml-research t1_j45nvno wrote on January 13, 2023 at 9:39 AM

#1,348,832

Yes, I guess feeding more data to larger models will be better in general.
But what should we (especially who do not have access to large computing resources) do while waiting for computation to be cheaper? Maybe balancing the amount of inductive bias and the improvement in performance to bring the predicted improvements a bit earlier?

chimp73 t1_j45vsgb wrote on January 13, 2023 at 11:24 AM

#1,349,164

Bitter lesson 3.0: The entire idea of fine-tuning on a large pre-trained model goes out of the window when you consider that the creators of the foundation model can afford to fine-tune it even more than you because fine-tuning is extremely cheap for them and they have way more compute. Instead of providing API access to intermediaries, they can simply sell services to the customer directly.

L43 t1_j45wbf1 wrote on January 13, 2023 at 11:30 AM

#1,349,196

Replying to chimp73 (#1,349,164)

Yeah I have a pretty dystopian outlook on the future because of this.

JustOneAvailableName t1_j45wg3c wrote on January 13, 2023 at 11:32 AM

#1,349,205

"In 70 years" feels extremely cautious. I would say it's in the next few years for regular ML, perhaps 20 years for robotics

mugbrushteeth t1_j45xihj wrote on January 13, 2023 at 11:44 AM

#1,349,261

Replying to ml-research (#1,348,832)

One dark outlook on this is the compute cost reduces very slowly (or does not reduce at all), the large models become the ones that only the rich can run. And using the capital that they earn using the large models, they reinvest and further accelerate the model development to even larger models and the models become inaccessible to most people.

Tea_Pearce OP t1_j4689f6 wrote on January 13, 2023 at 1:28 PM

#1,349,799

Replying to JustOneAvailableName (#1,349,205)

fair point, I suppose that timeframe was simply used to be consistent with the original lesson.

visarga t1_j46af21 wrote on January 13, 2023 at 1:45 PM

#1,349,931

Replying to ml-research (#1,348,832)

Exfiltrate the large language models - get them to (pre)label your data. Then use this data to fine-tune a small and efficient HF model. You only pay for the training data.

hazard02 t1_j46e13z wrote on January 13, 2023 at 2:12 PM

#1,350,154

Replying to chimp73 (#1,349,164)

I think one counter-argument is that Andrew Ng has said that there are profitable opportunities that Google knows about but doesn't go after simply because they're too small to matter to Google (or Microsoft or any megacorp), even though those opportunities are large enough to support a "normal size" business.

From this view, it makes sense to "outsource" the fine-tuning to businesses that are buying the foundational models because why bother with a project that would "only" add a few million/year in revenue?

Additionally, if the fine-tuning data is very domain-specific or proprietary (e.g. your company's customer service chat logs for example) then the foundational model providers might literally not be able to do it.

Having said all this, I certainly expect a small industry of fine-tuning consultants/tooling/etc to grow over the coming years

ghostfuckbuddy t1_j46eikm wrote on January 13, 2023 at 2:16 PM

#1,350,189

Replying to chimp73 (#1,349,164)

The compute is cheap but the data may not be easily accessible.

nohat t1_j46fofr wrote on January 13, 2023 at 2:24 PM

#1,350,264

That’s literally just the original bitter lesson.

RomanRiesen t1_j46ixvh wrote on January 13, 2023 at 2:47 PM

#1,350,457

Replying to chimp73 (#1,349,164)

Counter point: markets that are small and specialised and require tons of domain knowledge. E.g. training the model on israeli law in hebrew.

dimsycamore t1_j46jj4p wrote on January 13, 2023 at 2:51 PM

#1,350,498

Replying to mugbrushteeth (#1,349,261)

Already happening unfortunately

Nowado t1_j46klvj wrote on January 13, 2023 at 2:58 PM

#1,350,572

Replying to hazard02 (#1,350,154)

From this perspective you could say there are products that wouldn't make sense for Amazon to bother with. How's that working out.

hazard02 t1_j46mbb6 wrote on January 13, 2023 at 3:09 PM

#1,350,693

Replying to Nowado (#1,350,572)

Edit:
OK I had a snarky comment here, but instead I'd like to suggest that the business models are fundamentally different: Amazon sells products that they (mostly) don't produce, and offers a platform for third-party vendors. In contrast to something like OpenAI, they're an aggregator and an intermediary.

thedabking123 t1_j46pulo wrote on January 13, 2023 at 3:33 PM

#1,350,920

Replying to L43 (#1,349,196)

the one thing that could blow all this up is requirements for explainability; which could push the industry into low cost (but maybe low performance) methods like neurosymbolic computing whose predictions are much more understandable and explainable

I can see something to do with self driving cars (or LegalTech, or HealthTech) that results in a terrible prediction with real consequences. This would then drive the public backlash against unexplainable models, and maybe laws against them too.

Lastly this would then make deep learning models and LLMs less attractive if they fall under new regulatory regimes.

ThirdMover t1_j46t3fc wrote on January 13, 2023 at 3:53 PM

#1,351,136

Replying to hazard02 (#1,350,693)

I think the point of the metaphor was Amazon stealing product ideas from third party vendors on their site and undercutting them. They know what sells better than anyone and can then just produce it.

If Google or OpenAI offers people the opportunity to finetune their foundation models they will know when something valuable comes out of it and simply replicate it then. There is close to zero institutional cost for them to do so.

That's a reason why I think all these startups that want to build business models around ChatGPT are insane: if you do it and it actually turns out to work OpenAI will just steal your lunch and you have no way of stopping that.

mgostIH t1_j46xnpc wrote on January 13, 2023 at 4:22 PM

#1,351,421

The real bitter lesson is how Standford got so many authors cited for introducing nothing but a less descriptive name than "Large models"

Farconion t1_j46yl1j wrote on January 13, 2023 at 4:28 PM

#1,351,488

seems a bit premature since foundation models have only been around for 3-5 years

currentscurrents t1_j4702g0 wrote on January 13, 2023 at 4:37 PM

#1,351,596

Replying to mugbrushteeth (#1,349,261)

Compute is going to get cheaper over time though. My phone today has the FLOPs of a supercomputer from 1999.

Also if LLMs become the next big thing you can expect GPU manufacturers to include more VRAM and more hardware acceleration directed at them.

currentscurrents t1_j4716tp wrote on January 13, 2023 at 4:44 PM

#1,351,661

Replying to ml-research (#1,348,832)

Try to figure out systems that can generalize from smaller amounts of data? It's the big problem we all need to solve anyway.

There's a bunch of promising ideas that need more research:

Neurosymbolic computing
Expert systems built out of neural networks
Memory augmented neural networks
Differentiable neural computers

Nowado t1_j4723n6 wrote on January 13, 2023 at 4:50 PM

#1,351,722

Replying to ThirdMover (#1,351,136)

That was precisely the point.

Amazon started as a sales service and then moved to become platform. Once it was platform, everyone assumed that sales business was too small for them.

And then they started to cannibalize businesses using their platform.

shmageggy t1_j4735jr wrote on January 13, 2023 at 4:56 PM

#1,351,794

Replying to Farconion (#1,351,488)

seems a bit obvious since foundation models have already been around for 3-5 years

rafgro t1_j47678z wrote on January 13, 2023 at 5:15 PM

#1,351,995

Replying to nohat (#1,350,264)

See, it's not bitter lesson 1.0 when you replace "leverage computation" with "leverage large models that require hundreds of GPUs and entire internet". Sutton definitely did not write in his original essay that every bitter cycle ends with:

>breakthrough progress eventually arrives by an approach based on scaling computation

Phoneaccount25732 t1_j477kis wrote on January 13, 2023 at 5:23 PM

#1,352,089

Replying to hazard02 (#1,350,154)

The reason Google doesn't bother is that they are aggressive about acquisitions. They're outsourcing the difficult risky work.

DisWastingMyTime t1_j47ans8 wrote on January 13, 2023 at 5:41 PM

#1,352,325

Replying to thedabking123 (#1,350,920)

In vision/robotics this is already the case, low hardware/liw cost requirements is an incredible seller for automotive industry, so large disgusting models are out.

But we still use deep, if anything it's pretty surprising how much is possible with "shallow" models, for specialized domains, but thats still very far from explainable models

WokeAssBaller t1_j47ao05 wrote on January 13, 2023 at 5:41 PM

#1,352,328

Nah foundational models will be replaced with distributed ones

RandomCandor t1_j47bx4j wrote on January 13, 2023 at 5:49 PM

#1,352,413

Replying to currentscurrents (#1,351,596)

To me, all that means is that the lay people will always be a generation behind from what the rich can afford to run

anonsuperanon t1_j47g6e3 wrote on January 13, 2023 at 6:15 PM

#1,352,702

Replying to mugbrushteeth (#1,349,261)

Literally just the history of all technology, which suggests saturation given enough time.

granddaddy t1_j47hbby wrote on January 13, 2023 at 6:22 PM

#1,352,757

Replying to chimp73 (#1,349,164)

This guy makes a similar comparison in his blog but goes into a bit more detail than the tweet.

https://trees.substack.com/p/false-dichotomy-and-disillusion-in

Is it worth creating your own models or extensively fine-tuning foundational models? Probably not.

lookatmetype t1_j47o3hu wrote on January 13, 2023 at 7:03 PM

#1,353,175

Replying to nohat (#1,350,264)

yeah i'm lost because i literally don't understand the distinction

psychorameses t1_j47q301 wrote on January 13, 2023 at 7:15 PM

#1,353,281

This is why I hang my hat on software engineering. You guys can fight over who has the better data or algorithms or more servers. Ultimately yall need stuff to be built, and that's where I get paid.

KhurramJaved t1_j47qiu0 wrote on January 13, 2023 at 7:18 PM

#1,353,316

Seems like a fairly contrived take. The bitter lesson is about a general principle---algorithms that scale well with more data and compute win---whereas the foundation model regime---pre-train a model on a large dataset, and then either fine-tune it or use the features of the foundation model for down-stream---is a very specific way of leveraging data and compute. I see little reason why other regimes of using large amount of data and compute might not be better.

Based on my own research, my prediction is that foundation models will die out for robotics once we have scalable online continual learners. Extremely large models that are always learning in real-time would replace the foundation models paradigm.

make3333 t1_j47zeza wrote on January 13, 2023 at 8:13 PM

#1,353,802

Replying to chimp73 (#1,349,164)

& often don't even need to fine tune because of instruction pre training and few shot prompting

pm_me_your_pay_slips t1_j486wz7 wrote on January 13, 2023 at 9:00 PM

#1,354,250

Since scaling laws and foundational models are mainstream now, to whom is this "Bitter lesson 2.0" addressed?

pm_me_your_pay_slips t1_j48741u wrote on January 13, 2023 at 9:01 PM

#1,354,259

Replying to chimp73 (#1,349,164)

The bitter lesson will be when fine-tuning and training from scratch become the same thing.

pm_me_your_pay_slips t1_j487k7k wrote on January 13, 2023 at 9:04 PM

#1,354,289

Replying to Farconion (#1,351,488)

foundation models are mainstream now. Look at the curriculum of all top ML programs, they all have a class on scaling laws and big models.

pm_me_your_pay_slips t1_j488487 wrote on January 13, 2023 at 9:07 PM

#1,354,319

Replying to psychorameses (#1,353,281)

Except one software engineer + a foundation model for code generation may be able to replace 10 engineers. I'm taking that ratio out of my ass, but it might as well be that one engineer + foundation model replaces 5 or 100. Do you count yourself as that one in X engineers that won't lose their job in Y years?

currentscurrents t1_j48csbo wrote on January 13, 2023 at 9:37 PM

#1,354,540

Replying to RandomCandor (#1,352,413)

If it is true that performance scales infinitely with compute power - and I kinda hope it is, since that would make superhuman AI achievable - datacenters will always be smarter than PCs.

That said, I'm not sure that it does scale infinitely. You need not just more compute but also more data, and there's only so much data out there. GPT-4 reportedly won't be any bigger than GPT-3 because even terabytes of scraped internet data isn't enough to train a larger model.

Farconion t1_j48e9q4 wrote on January 13, 2023 at 9:46 PM

#1,354,622

Replying to pm_me_your_pay_slips (#1,354,289)

bitter lesson 1.0 was made in regard to 70 years of AI history

pm_me_your_pay_slips t1_j48k2ve wrote on January 13, 2023 at 10:23 PM

#1,354,919

Replying to Farconion (#1,354,622)

I guess so, there's nothing bitter in this so-called "bitter lesson 2.0"

psychorameses t1_j48la7w wrote on January 13, 2023 at 10:31 PM

#1,354,991

Replying to pm_me_your_pay_slips (#1,354,319)

For now, yeah. I'm the guy building their fancy hodgepodge theoretical linear algebra functions into efficient PyTorch backend code so it can actually do something. And the CI/CD pipelines, the serving systems and all of that. You could even say I'm contributing to the demise of those 10 engineers. Especially all the Javascript bootcamp CRUD engineers flooding NPM with god-knows-what these days.

Gotta back the winning side, not fight them. If foundation models get replaced by something else, I'll go build software for those guys and gals too.

BarockMoebelSecond t1_j48mepq wrote on January 13, 2023 at 10:39 PM

#1,355,056

Replying to RandomCandor (#1,352,413)

Which is and has been the Status Quo for the entire history of computing, I don't see how that's a new development?

boss_007 t1_j48qyxu wrote on January 13, 2023 at 11:09 PM

#1,355,288

Replying to ml-research (#1,348,832)

You don't have a dedicated tpu cluster in your lab? Pffftt

Arktur t1_j48rwm7 wrote on January 13, 2023 at 11:16 PM

#1,355,327

Replying to chimp73 (#1,349,164)

That’s not bitter lesson, that’s just Capitalism.

notdelet t1_j48yvht wrote on January 14, 2023 at 12:04 AM

#1,355,680

Hot take: foundation models is pure branding, so if they say it's foundation models it will be foundation models that we're all using.

currentscurrents t1_j490rvn wrote on January 14, 2023 at 12:18 AM

#1,355,793

Replying to BarockMoebelSecond (#1,355,056)

It's meaningful right now because there's a threshold where LLMs become awesome, but getting there requires expensive specialized GPUs.

I'm hoping in a few years consumer GPUs will have 80GB of VRAM or whatever and we'll be able to run them locally. While datacenters will still have more compute, it won't matter as much since there's a limit where larger models would require more training data than exists.

sabetai t1_j49eq10 wrote on January 14, 2023 at 2:04 AM

#1,356,634

Replying to chimp73 (#1,349,164)

API devs haven't been able to use GPT3 effectively, and will likely be competed away by more product-like releases like ChatGPT.

bloc97 t1_j49ft0g wrote on January 14, 2023 at 2:12 AM

#1,356,702

Replying to mugbrushteeth (#1,349,261)

My bet is on "mortal computers" (term coined by Hinton). Our current methods to train Deep Nets are extremely inefficient. CPU and GPUs basically have to load data, process it, then save it back to memory. We can eliminate this bandwidth limitation by printing basically a very large differentiable memory cell, with hardware connections inside representing the connections between neurons, which will allow us to do inference or backprop in a single step.

Playful_Ad_7555 t1_j49k8p2 wrote on January 14, 2023 at 2:47 AM

#1,356,920

Replying to currentscurrents (#1,351,596)

silicon computing is already very close to its limit based on foreseeable technology. the exponential explosion in computing power and available data from 2000-2020 isnt going to be replicated

Smallpaul t1_j4a0daf wrote on January 14, 2023 at 5:07 AM

#1,357,833

Replying to RomanRiesen (#1,350,457)

How many team members would it take ChatLawGPT and feed it tons of Hebrew content? Isn't the whole point that it can learn domain knowledge?

Smallpaul t1_j4a15b8 wrote on January 14, 2023 at 5:14 AM

#1,357,877

Replying to nohat (#1,350,264)

The first bitter lesson was "people who focused on 'more domain-specific algorithms' lost out to the people who just waited for massive compute power to become available." I think the second bitter lesson is intended to be Robotics-specific and it is "people who focus on 'robotics-specific algorithms' will lose out to the people who leverage large foundation models from non-robotics fields, like large language models."

weightloss_coach t1_j4a2sx8 wrote on January 14, 2023 at 5:31 AM

#1,357,973

Replying to chimp73 (#1,349,164)

It’s like saying that creators of database will create all SaaS products

For end user, many more things matter

gdiamos t1_j4a96pu wrote on January 14, 2023 at 6:42 AM

#1,358,257

Replying to mugbrushteeth (#1,349,261)

Currently we have exascale computers, e.g. 1e18 flops at around 50e6 watts.

The power output of the sun is about 4e26 watts. That's 20 orders of magnitude on the table.

This paper claims that energy of computation can theoretically be reduced by another 22 orders of magnitude. https://arxiv.org/pdf/quant-ph/9908043.pdf

So physics (our current understanding) seems to allow at least 42 orders of magnitude bigger (computationally) learning machines than current generation foundation models, without leaving this solar system, and without converting mass into energy...

Opposite-Platypus-99 t1_j4ahpg6 wrote on January 14, 2023 at 8:32 AM

#1,358,584

Replying to currentscurrents (#1,351,596)

now, can you confirm you can run arbitrary software on your phone?

Dwood15 t1_j4as7tk wrote on January 14, 2023 at 10:56 AM

#1,358,965

Replying to pm_me_your_pay_slips (#1,354,919)

The bitter, bitter scaling law: "More compute makes a lot more possible"

[deleted] t1_j4b0gez wrote on January 14, 2023 at 12:40 PM

#1,359,329

Replying to weightloss_coach (#1,357,973)

[deleted]

Illustrious_Mix_894 t1_j4b10ri wrote on January 14, 2023 at 12:46 PM

#1,359,378

What if we use the same amount of compute resource for approaches like those Monte Carlo methods for limited data domain

moschles t1_j4nch5w wrote on January 16, 2023 at 10:48 PM

#1,380,145

> Seems to be derived by observing that the most promising work in robotics today (where generating data is challenging) is coming from piggy-backing on the success of large language models (think SayCan etc).

There is nothing really magical being claimed here. The LLMs are undergoing unsupervised training. essentially by creating distortions of the text. (one type of "distortion" is Cloze Deletion. But there are others in the panoply of distorted text.)

Unsupervised training avoids the bottleneck of having to manually pre-label your dataset.

When we translate unsupervised training to the robotics domain, what does that look like? Perhaps "next word prediction" is analogous to "next second prediction" of a physical environment. And Cloze Deletion has an analogy to probabilistic "in-painting" done by existing diffusion models.

That's the way I see it. I'm not particular sold on this idea that the pretraining would be literal LLM trained on text, ported seamlessly to the robotics domain. If I'm wrong, set me straight.

moschles t1_j4nczb1 wrote on January 16, 2023 at 10:52 PM

#1,380,179

Replying to pm_me_your_pay_slips (#1,354,250)

Or worse, is "Foundation Model" just a contemporary buzzword replacement for unsupervised training?

GPT-5entient t1_j4s8q64 wrote on January 17, 2023 at 10:14 PM

#1,389,899

Replying to ThirdMover (#1,351,136)

>I think the point of the metaphor was Amazon stealing product ideas from third party vendors on their site and undercutting them. They know what sells better than anyone and can then just produce it.

In many cases they are probably just selling the same white label item outright, just slapping on "Amazon Basics"...

fullouterjoin t1_j4vbawe wrote on January 18, 2023 at 2:49 PM

#1,395,778

Replying to thedabking123 (#1,350,920)

> requirements for explainability

We have to start pushing for this legislation now. If you leave it up to the market, Equifax will just make a magic Credit Score model that will be like huffing tea leaves.

Comments