Submitted by hardmaru t3_yxf875 in MachineLearning
dat_cosmo_cat t1_iwpav2u wrote
> [this year], large language models (LLMs) finally got good.
Every year it's like dejavu with this shit since 2018.
yldedly t1_iwpdtne wrote
"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's train a bigger model."
"Wow, this test accuracy is way better!" "Ok, how does it do on OOD data?" "Hmm, not great. Let's..."
Dankmemexplorer t1_iwqebgp wrote
true, they do keep getting gooder and people are like "we solved it this year"
i think it got good enough for most things in 2020 with gpt-3 like how dall-e/SD is good enough for most things now
dat_cosmo_cat t1_iwqnbt1 wrote
The ubiquity of pretrained BERT + ResNet models in commercial software applications (and the measurable lift they deliver) is proof that they've been "good enough" for years. Sometimes these articles can come off a bit naive to the impact that the technology has already had or how widely it is used beyond the specific application that is most observable / accessible to the author.
---AI--- t1_iwsg3qp wrote
How does this shit get up voted? There's a very clear constant improvement. The models are objectively better each year.
dat_cosmo_cat t1_iwt6yam wrote
Because LLMs are incrementally better each year, not fundamentally better. The claim that now they are useful has become a cliche within the field of ML.
---AI--- t1_iwtd2xx wrote
But they are useful. Look at the thousands of real world uses. Look at grammerly, translation, protein folding, and so on. How can you possibly deny it?!
> not fundamentally better
In just the last two years, the models went from scoring 43 on this system of testing to 75. How much more of a fundamental improvement are you after?!
dat_cosmo_cat t1_iwteguv wrote
You and I are literally saying the same things. These models have been in prod on every major software platform since BERT.
We don't even need to look at offline eval metrics anymore. If you're an actual MLE / data scientist you likely have the pipelines set up which directly measure the engagement / attributable sales differences and report the real business impact across millions of users each time a new model is released.
I work on a team that has made millions of dollars building applications on top of LLMs since 2018, so when I see the claim "LLMs finally got good this year" it's hard not to laugh. --this is what I am getting at.
Edit*: did you read the article?
Viewing a single comment thread. View all comments