piyabati t1_iybj44z wrote on November 30, 2022 at 4:06 AM

Data scarcity is a problem of methods not data.

Starting about a decade ago, cheap hardware made it possible to run vast datasets, allowing for models with more degrees of freedom. These models in turn led to the demand for massive amounts of human labeled data. It's questionable whether all this vast amount of crunching has led to an improved understanding of the world, although we now have machines that can mimic humans a lot better than they used to. The whole exercise of iterating over increasingly bigger models and bigger data, without any increase in fundamental scientific understanding, feels as pointless as bitcoin mining.

What is holding back AI/ML is to continue to define intelligence the way Turing did back in 1950 (making machines that can pass as human), and chasing big data, especially human labeled data and its attendant subjectivity and pointlessness. Essentially, we are getting hung up on local minima in the search for intelligence.

currentscurrents t1_iybz6a1 wrote on November 30, 2022 at 6:52 AM

I do agree that current ML systems require much larger datasets than we would like. I doubt the typical human hears more than a million words of english in their childhood, but they know the language much better than GPT-3 does after reading billions of pages of it.

> What is holding back AI/ML is to continue to define intelligence the way Turing did back in 1950 (making machines that can pass as human)

But I don't agree with this. Nobody is seriously using the Turing test anymore, these days AI/ML is about concrete problems and specific tasks. The goal isn't to pass as human, it's to solve whatever problem is in front of you.

yldedly t1_iydiq69 wrote on November 30, 2022 at 4:29 PM

>The goal isn't to pass as human, it's to solve whatever problem is in front of you.

It's worth disambiguating between solving specific business problems, and creating intelligent (meaning broadly generalizing) programs that can solve problems. For the former, what Francois Chollet calls cognitive automation is often sufficient, if you can get enough data, and we're making great progress. For the latter, we haven't made much progress, and few people are even working on it. Lots of people are working on the former, and deluding themselves that one day it will magically become the latter.

piyabati t1_iydpx86 wrote on November 30, 2022 at 5:15 PM

The hottest problems in NLP, computer vision, even self-driving cars, are almost solely defined in terms of how well a machine can mimic a human.

Desperate-Whereas50 t1_iye5kfo wrote on November 30, 2022 at 6:54 PM

>I doubt the typical human hears more than a million words of english in their childhood, but they know the language much better than GPT-3 does after reading billions of pages of it.

But is this a fair comparison? I am far a way from being an expert in Evolution but I assume we have some evolutinoary in coded bias to learn language easier. Whereas ML systems have to begin from 0.

currentscurrents t1_iye68b8 wrote on November 30, 2022 at 6:58 PM

Well, fair or not, it's a real challenge for ML since large datasets are hard to collect and expensive to train on.

It would be really nice to be able to learn generalizable ideas from small datasets.

Desperate-Whereas50 t1_iye7hf3 wrote on November 30, 2022 at 7:07 PM

Thats correct. But to define what is the bare minimum, you need a baseline. I just wanted to say that humans are a bad baseline because we have "training data" encoded in our DNA. Further for tabular data ML systems often outperform humans with not as much training data.

But of course less data needed with good training results is always better. I would not argue about that.

Edit: Typos

phobrain t1_iybum8b wrote on November 30, 2022 at 5:57 AM

> hung up on local minima

We are the local minima that we seek.

kaskoosek t1_iyc6m6u wrote on November 30, 2022 at 8:33 AM

I like how you think.

Though we are very far off from understanding consciousness.

I feel like what roger penrose is doing is more what you are describing.

Data science cares about output more than the science behind the humam brain. Though i think neural networks are very interesting.