Viewing a single comment thread. View all comments

ImSuperHelpful t1_j9apjid wrote

Your argument neglects the business side of the situation which explains the motivations to allow and disallow use in the two scenarios… if I run a content website, a search engine crawling the site so it can generate search results which send traffic to my site is beneficial to both parties, it’s symbiotic.

Alternatively, if I run a content site that an AI company crawls and then uses to train a model which then negates the need for my site to would-be visitors, it’s parasitic.

0

gurenkagurenda t1_j9avebb wrote

I'm not neglecting anything. I'm asking for some semblance of precision in defining model training out of fair use. The purpose and character of use, and the effect on the market are already factors in fair use decisions, but that's a lot more complicated of an issue than "AI models can't scrape content." It's specific to the application, and even for ChatGPT specifically, it would be pretty murky.

3

ImSuperHelpful t1_j9awd5k wrote

Except that’s what was missing from your original point, but either way I gave you a starting point… if it’s beneficial for both parties and both parties consent (which content site operators do via robot.txt instructions), no one has a problem. In the AI case it’s beneficial to the AI creator/owner but harmful to the content owner since the AI is competing with them by using their content, so it shouldn’t be considered free use.

−1

gurenkagurenda t1_j9b9muc wrote

>Except that’s what was missing from your original point

Again, it's not missing from my original point, because my original point was to ask how the commenter above was distinguishing these cases. You've given a possible answer. That's an answer to my question, not a rebuttal.

I don't think that answer is very compelling, though. Arguing that an explicitly unreliable chat bot that hallucinates as often as it tells the truth is somehow a competitor to news media etc. is a tall order.

1

ImSuperHelpful t1_j9biixs wrote

I didn’t present it as a rebuttal, I added important context that was missing from your question that makes the answer much more clear.

And these thing are unreliable now, but Microsoft and others are dumping billions of dollars into making them better and they’re doing it for profit. Waiting around until they’re perfected before fighting against the ongoing unfair use of copyrighted content is a sure fire strategy to losing that fight.

1

gurenkagurenda t1_j9bl2kk wrote

What they're dumping money into now on this front are AI enhanced search engines, which are complimentary to the content they're training on.

1

ImSuperHelpful t1_j9brj6f wrote

That’s all they’ve launched on this front, that doesn’t mean it’s all they’re working on

1

zutnoq t1_j9bhoq8 wrote

Search providers like google don't just show you links though. They also show you potentially relevant excerpts so you often don't even need to go to the linked site to get what you were after, and show previews of images in image search etc.

Determining exactly where to draw the line of what to consider fair-use for things like this is a highly complex and dynamic issue. Web search engines are (by necessity) parasitic as well but that alone neither makes them bad nor illegal.

Parasitic is also not the "bad" counterpart of symbiotic. A symbiotic relationship is simply a parasitic relationship that benefits both parties. Just saying parasitic says nothing about which side(s) would benefit. I think exploitative would be a more appropriate word to use for such relationships.

3

ImSuperHelpful t1_j9bqacf wrote

Those relevant excerpts and similar features have been pretty detrimental to search click through rates in certain areas (they’re known as “no click” searches in the industry)… but the alternative is to block google bots entirely, which isn’t viable if you’re operating a content site since google has an effective monopoly on search. Also, those features do still link out to the content they’re showing on the SERP, whereas the chat ai doesn’t and gives the appearance that it’s the source of the information.

Your point about vocabulary is fair

2