flexaplext t1_je7ampt wrote on March 29, 2023 at 10:42 PM

I think this and OP's take is completely wrong. I would say it's going to be mostly non-useful data that's lost.

People will still write Wikipedia articles, they just won't be read as much, but the data on the site will still be valid.

People will still ask questions on stack overflow but there will be fewer questions and the number of trivial questions will significantly reduce as these are easier for AI to answer. But novel and difficult questions will still need to be asked to stack overflow, because AI isn't capable of answering them. And people will still want to and be interested in answering the more novel questions.

Thus, the overall effect will actually be to significantly improve the data, and engage people better. People wanting to answer questions will enjoy the experience much more with the less easy and obvious questions being removed. And they will be able to stumble across the interesting questions more easily and efficiently.

It should actually end up creating a much richer set of data for models to train on.

Think about it. If all the questions on stack overflow that are asked and answered were only questions that the models couldn't answer, that's like literally the most perfect training data. It filters out the questions it already knows the answer too which is not useful for it to read.

And this effect (I think it needs a name if it doesn't have one? - The Flex Effect if I just came up with it 😂😂) will only adapt over time with increased model output accuracy. As the model updates and its answers get better, so too will the questions, and subsequent answers. They'll get better and more and more difficult, matching the critea for what the new model then still needs to learn and train on.

Spetznaaz t1_je97vsp wrote on March 30, 2023 at 10:12 AM

Very interesting viewpoint. Certainly seems plausible.

nobodyisonething OP t1_je751vl wrote on March 29, 2023 at 10:01 PM

I'm expecting a predictable scenario like this:

The growth of freely available information on the internet slows down as proprietary AIs become the go-to for answers.
Proprietary AIs start actively trying to hide information behind paywalls to gain an advantage over their rivals
The golden age of all-you-can-eat information is lost and nobody realized it was happening.

Prestigious-Ad-761 t1_je7h76n wrote on March 29, 2023 at 11:31 PM

In my opinion that part happened already. Except for a few last foci of resistance like wikipedia. Google never gives you more than 400 results per search. And seeing as exact search has been disabled, we effectively lost more than 99% of what once was the internet. Thank god they were forced to use the (real, old, now unavailable to us) internet to train them. At least a little piece of it is saved.

But yeah... Paywalls.

VinoVeritable t1_je8is51 wrote on March 30, 2023 at 4:36 AM

Do you know why exact search has been disabled?

Prestigious-Ad-761 t1_jeb5dzx wrote on March 30, 2023 at 7:01 PM

I imagine the following combination of factors.

Most people were not educated about it, had no clue it existed, let alone how useful it can be. Very few users truly used it (acording to what another search engine told me).
As a corolary of number 1, they felt they could pinch some pennies by removing that function.
When they did, also as a corolary of number 1, they noticed that the outrage was not in sufficient volume to be a threat.
Apple removed the ratings system on the apple store, in order to be able to sell the possibility for rankings to be sold instead of deriving from user satisfaction.
Most people being casuals did not even notice that it was gone. No public outrage.
Corolary of 5 - Google app store followed suit, then Youtube did too, so did Tripadvisor, Imdb (for a while). And little by little, the possibility to filter content according to user preference started to fade. And content started to be prioritised according to commercial guidelines and not number of visitors or external links, perceived respectability, density of content and keyword relevance (which remain a parameter of the algorithm, but at a much lower rung in the ladder than before). Well, on keyword relevance, it's technically gone now that there's no exact search.
Google saw this and applied this commercial outlook to the search engine, not just the app store. They profited immensely.
Nowadays, the shape of the internet changed from a planet to the tip of an iceberg; but many new users weren't born before these changes, many more did not use those functions and even less of them would care.

I imagine.

NoGravitasForSure t1_je8rquq wrote on March 30, 2023 at 6:21 AM

This discussion reminds me of the situation in the 90s. Around 1995 when the internet slowly transformed from a toy for tech nerds into what it is today, There was much talk about commercialisation and how this would impact the freedom we enjoyed so far.

Now we have paywall sites, but also Wikipedia, Stack Overflow and an abundance of free stuff, a lot more than back in the days when the internet was still a tiny playground.

So .. I guess it is just impossible to predict what the future will bring, but I am not overly pessimistic.

jetro30087 t1_je8orbd wrote on March 30, 2023 at 5:44 AM

Wikipedia is update with new information. Worst case scenario they are just partnering with AI firms to provide 'AI ready' new articles for indexing. If a topic is not in the dataset, it still needs to go to internet websites to find the information.

CertainMiddle2382 t1_je8qe5r wrote on March 30, 2023 at 6:04 AM

Most data is already found on the web, I haven’t read a real investigation for years…

The Rise of AI will Crush The Commons of the Internet

BigZaddyZ3 t1_je74czz wrote on March 29, 2023 at 9:57 PM