Submitted by nobodyisonething t3_1261qk6 in singularity

Information sites that depend on traffic and volunteer insights from their visitors will start to see a very sharp decline in people volunteering answers.

Why? Because people will be getting their answers directly from AIs ( think embedded ChatGPT etc -- with good results baked in.)

How does this kill Wikipedia, StackOverflow, etc? As the proprietary backends of ChatGPT and others collect ( through their new plugins ) the information that is already out there -- they will start providing it back directly to their users. If you don't visit StackOverflow, you do not answer on StackOverflow.

Eventually, these rich sites of publicly visible information will grow stale like rarely visited museums of dust. What will we lose?

Medium --> ai-strip-mining-the-internet-fe19d8482b10

27

Comments

You must log in or register to comment.

BigZaddyZ3 t1_je74czz wrote

🤔… That would be a very interesting dilemma, if true. Because it would also mean that future AIs won’t have as much new data to train on as well.

17

nobodyisonething OP t1_je751vl wrote

I'm expecting a predictable scenario like this:

  1. The growth of freely available information on the internet slows down as proprietary AIs become the go-to for answers.
  2. Proprietary AIs start actively trying to hide information behind paywalls to gain an advantage over their rivals
  3. The golden age of all-you-can-eat information is lost and nobody realized it was happening.
12

flexaplext t1_je7ampt wrote

I think this and OP's take is completely wrong. I would say it's going to be mostly non-useful data that's lost.

People will still write Wikipedia articles, they just won't be read as much, but the data on the site will still be valid.

People will still ask questions on stack overflow but there will be fewer questions and the number of trivial questions will significantly reduce as these are easier for AI to answer. But novel and difficult questions will still need to be asked to stack overflow, because AI isn't capable of answering them. And people will still want to and be interested in answering the more novel questions.

Thus, the overall effect will actually be to significantly improve the data, and engage people better. People wanting to answer questions will enjoy the experience much more with the less easy and obvious questions being removed. And they will be able to stumble across the interesting questions more easily and efficiently.

It should actually end up creating a much richer set of data for models to train on.

Think about it. If all the questions on stack overflow that are asked and answered were only questions that the models couldn't answer, that's like literally the most perfect training data. It filters out the questions it already knows the answer too which is not useful for it to read.

And this effect (I think it needs a name if it doesn't have one? - The Flex Effect if I just came up with it 😂😂) will only adapt over time with increased model output accuracy. As the model updates and its answers get better, so too will the questions, and subsequent answers. They'll get better and more and more difficult, matching the critea for what the new model then still needs to learn and train on.

16

Prestigious-Ad-761 t1_je7h76n wrote

In my opinion that part happened already. Except for a few last foci of resistance like wikipedia. Google never gives you more than 400 results per search. And seeing as exact search has been disabled, we effectively lost more than 99% of what once was the internet. Thank god they were forced to use the (real, old, now unavailable to us) internet to train them. At least a little piece of it is saved.

But yeah... Paywalls.

9

alexiuss t1_je7jc4d wrote

> with good results baked in

Checked medium article, author doesn't know anything about anything.

"Good results baked in" is NOT how AIs work and not how you get the most optimal answer. Science is about discovering new truths and moving forward. The best thing about AIs is their creativity. Best answers are when they can cross reference answer with search and examine it logically like a person arriving at a novel solution to a problem.

Hold up... What if we create AI that writes new articles on wikipedia by observing the world? Hmmmm? If humans won't give us new free data because they're so busy playing with LLMs, open source AIs will!

Ais will contribute to growth of the internet!

5

jetro30087 t1_je8orbd wrote

Wikipedia is update with new information. Worst case scenario they are just partnering with AI firms to provide 'AI ready' new articles for indexing. If a topic is not in the dataset, it still needs to go to internet websites to find the information.

2

CodingCook t1_je8pdff wrote

Honestly - in my experience of Stack Overflow and the absolute pig-headedness involved in some of their users - using something like GitHub Copilot purely for code related questions would be worth the £10 a month to get instant and accurate answers over potentially getting my account temporarily banned because the question wasn’t considered good enough for them.

2

NoGravitasForSure t1_je8rquq wrote

This discussion reminds me of the situation in the 90s. Around 1995 when the internet slowly transformed from a toy for tech nerds into what it is today, There was much talk about commercialisation and how this would impact the freedom we enjoyed so far.

Now we have paywall sites, but also Wikipedia, Stack Overflow and an abundance of free stuff, a lot more than back in the days when the internet was still a tiny playground.

So .. I guess it is just impossible to predict what the future will bring, but I am not overly pessimistic.

4

UltimatePitchMaster t1_je8ti0m wrote

Much like the premise of the Dead Internet Theory, nearly all content in the future will be created by generative AIs. Models will learn from one another, but a valuable source of new data will come in the form of ratings from humans. People will respond to content, proving some to be more valuable than other pieces of content, and the AIs will learn to create content that would be popular and exceed user expectations. At that point, they would have limitless creativity. They would no longer require prompting from humans, they would just need examples of when humans responded positively.

5

byttle t1_jeaofw7 wrote

AI will start paying us for our opinions most likely. It’s going to want the truth with a capital T. Tested, trusted, trillions of times. How do you get truth nowadays?

1

byttle t1_jeata09 wrote

we currently pay reddit or other organizations to promote our opinions, pay for upvotes, etc.

So in your scenario the AI is going to be paying for its posts to be highly upvoted and flaired?

I think the point i'd like to make is that it would be boring, and nonsensical to know you're viewing an only ai discussion of opinions about a movie you might wanna watch.

1

nobodyisonething OP t1_jeayozm wrote

>it would be boring, and nonsensical to know you're viewing an only ai discussion of opinions

I'm inclined to think I would not choose that -- but in the future, will the AI have an opinion worth engaging with? I already turn to ChatGPT for technical opinions -- and it has been very useful.

1

Prestigious-Ad-761 t1_jeb5dzx wrote

I imagine the following combination of factors.

  1. Most people were not educated about it, had no clue it existed, let alone how useful it can be. Very few users truly used it (acording to what another search engine told me).

  2. As a corolary of number 1, they felt they could pinch some pennies by removing that function.

  3. When they did, also as a corolary of number 1, they noticed that the outrage was not in sufficient volume to be a threat.

  4. Apple removed the ratings system on the apple store, in order to be able to sell the possibility for rankings to be sold instead of deriving from user satisfaction.

  5. Most people being casuals did not even notice that it was gone. No public outrage.

  6. Corolary of 5 - Google app store followed suit, then Youtube did too, so did Tripadvisor, Imdb (for a while). And little by little, the possibility to filter content according to user preference started to fade. And content started to be prioritised according to commercial guidelines and not number of visitors or external links, perceived respectability, density of content and keyword relevance (which remain a parameter of the algorithm, but at a much lower rung in the ladder than before). Well, on keyword relevance, it's technically gone now that there's no exact search.

  7. Google saw this and applied this commercial outlook to the search engine, not just the app store. They profited immensely.

  8. Nowadays, the shape of the internet changed from a planet to the tip of an iceberg; but many new users weren't born before these changes, many more did not use those functions and even less of them would care.

I imagine.

1