Comments

You must log in or register to comment.

AbeWasHereAgain t1_je9mpyf wrote

I love that OpenAI uses a ton of other peoples work to train their model, yet when someone uses OpenAI to train their model, they get all up in arms.

As far as I'm concerned, OpenAI has decided terms of use don't exist anymore.

131

YummyMummy2024 t1_je9pyvh wrote

Are you referring to other peoples algorithms or the content used to train the Ai language model.

14

DorkRockGalactic t1_jeaaxw5 wrote

Not OP but I'd say they mean the content.

AI is trained using all the collective output of human beings on Reddit, Stackoverflow, etc.

We're not compensated for it, so why should anyone care about using the dataset they put together that is this content?

28

YummyMummy2024 t1_jeadq3t wrote

The content is public and free to access? The algorithm is ip. I could be wrong though.

4

FrowntownPitt t1_jeanox9 wrote

Just because something is free to access doesn't mean you have the right to do whatever you want with it, especially with regards to making derivative works without attribution or otherwise breaking license terms. This is what licenses and copyrights are for.

For example, if OpenAI scraped a code repository that uses a Creative Commons NonCommercial license and is using that code for monetary gain without the owner's consent, they're breaking that license. It'd have to be argued whether the fact that OpenAI used that code to train their models which may generate code to similar likeness counts as distributing the source, and whether having a user use that model under a paid service counts as a commercial violation of those terms.

The algorithm is IP, yes. But GPT-X is part model part training data.

19

YummyMummy2024 t1_jeaoukd wrote

No doubt those licensing were ignored but without evidence how do you make that copyright claim? Without evidence does that make it derivative? What do you think?

0

FrowntownPitt t1_jeaq30n wrote

I mean yeah I agree, enforcing something like this is going to be very very difficult. But there are several clear examples of something like DallE generating images very similar to or nearly identical to copyrighted IP.

IANAL, but I presume a claimant could be able to establish some reasonable certainty to a court that licensed works were used in a way that breaks the license, at which point OpenAI (or really any AI company) would be responsible for defending their practice or non-use of those licensed works

5

ShadoWolf t1_jeb05vb wrote

You signed over your rights to your content . when you signed up to reddit, or facebook, or google.

It's not like OpenAI is using some shoestring budget web scrapper using python and the beautifulsoup library.

They have partnerships .. and requested the raw text data.

−1

Particular-Way-8669 t1_jebcnyt wrote

You signed off those rights away to these sites. Not to OpenAI lol. It is still your IP. You can not go and copy it because you posted it there because you Will be hit with infrigement law suit. Reddit, Facebook, Google received your permission to use it in certain way. And yes Google or Facebook can potentionally claim it used those data fairly for their models. OpenAI? Not a chance.

2

ShadoWolf t1_jebhlhj wrote

unfortunately your wrong:

  1. Your Content

The Services may contain information, text, links, graphics, photos, videos, audio, streams, or other materials (“Content”), including Content created with or submitted to the Services by you or through your Account (“Your Content”). We take no responsibility for and we do not expressly or implicitly endorse, support, or guarantee the completeness, truthfulness, accuracy, or reliability of any of Your Content.

By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.

You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

Any ideas, suggestions, and feedback about Reddit or our Services that you provide to us are entirely voluntary, and you agree that Reddit may use such ideas, suggestions, and feedback without compensation or obligation to you.

Although we have no obligation to screen, edit, or monitor Your Content, we may, in our sole discretion, delete or remove Your Content at any time and for any reason, including for violating these Terms, violating our Content Policy, or if you otherwise create or are likely to create liability for us.

1

Particular-Way-8669 t1_jebiwfy wrote

Why do you even bother copying something without reading it?

"You retrain any ownership rights..."

End of Story, I am right. It says exactly what I said it did. You grant Reddit (and only Reddit) rights to manipulate with your content as written in TOS. You do not grant it to anyone else. If Reddit partners with someone then they would also be included if Reddit gave them that right. But this is not what happened. OpenAI scrapped internet. There was no partnership with reddit or anyone whatsoever.

2

ShadoWolf t1_jebs6px wrote

You retain ownership... but you more or less signed over all right in what they can do with said information... it right there in the highlighted text.

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit << this is the part that lets them hand it over to companies like OpenAI

0

Particular-Way-8669 t1_jebu396 wrote

Again if Reddit trained their own AI on user's data or gave that data to openAI as part of contract then you would have the point. But this is not what happened. OpenAI did not ask anyone. They run data crawling scripts and stole data without asking. It is nothing like what Reddit is doing. You did not sign anything off to OpenAI.

2

johndburger t1_jebyej1 wrote

> this is the part that lets them hand it over to companies like OpenAI

Your claim is that OpenAI has negotiated usage rights from every single site it’s gotten data from? Do you have any evidence for this?

1

khamelean t1_jeaogv3 wrote

Just because something is free to access, doesn’t mean you are allowed to remember it or learn from it any way!!

−3

AbeWasHereAgain t1_jeat78b wrote

Go ask Vanilla Ice what happens when your music sounds a little too close to the original.

OpenAI, and Microsoft, are 100% violating terms of use for the vast majority of the stuff they scraped.

6

khamelean t1_jeautqo wrote

All musicians learn from hearing other music.

There is a difference between learning and copying.

−3

No_Character_8662 t1_jeaxen9 wrote

So if I call something in my process "learning" I'm free to use it? I'm learning copies of your works on my printer to sell right now

Edit: to be clear I don't know what the answer is but that seems simplistic

7

Numai_theOnlyOne t1_jeb9lah wrote

Tbh can we separate human learning with AI learning?

A human is a biological imperfect being that require time and repetition to learn.

AI needs just a large pool of data and can the same as millions of humans in a fracture of the time required.

I think that's not the same learning, and a thing that honestly should be questioned, after all our content was created with humans in mind and not meant to been used for ai.

5

khamelean t1_jeaztbb wrote

The learning isn’t the problem, the selling is.

−4

Newfondahloose t1_jeb49yx wrote

They are selling their own work. There’s only so many ways you can answer a question. Just because you’ve answered the question before, doesn’t mean someone else can’t come to the same conclusion when answering for themselves.

2

AbeWasHereAgain t1_jeaxmj7 wrote

lol - you don't think ChatGPT is spitting out insanely close replicas of other peoples work daily?

4

khamelean t1_jeazo62 wrote

Nothing wrong with playing/singing other people’s songs, I sing along to the radio in my car all the time.

1

AbeWasHereAgain t1_jeazzbi wrote

ha ha ha - yeah, totally the same thing. Just an FYI, artists are required to pay when they do a cover.

Everything changes when you start making money off other peoples work.

3

khamelean t1_jeb0muo wrote

That’s exactly my point.

1

AbeWasHereAgain t1_jeb0qkx wrote

What is your point?

2

khamelean t1_jeb2qq4 wrote

It’s not a problem until you start making money off other peoples work.

2

Space_Pirate_R t1_jeb6yrh wrote

Are monetized AI artists paying royalties to everyone whose art was scraped off the web?

1

khamelean t1_jebg7rh wrote

Are human artist paying royalties to everyone who’s art they scraped off the web??

1

Space_Pirate_R t1_jebi0au wrote

Human artists learning from others' work is obviously "fair use." I don't think a corporation will successfully deploy that in defense of training a commercial AI.

1

khamelean t1_jebj0yq wrote

Just looking at a piece of art is enough to encode it into a human’s neural network. Why should it be any different for an artificial neural network? If it’s free to access then it’s free to access.

1

Space_Pirate_R t1_jebovpk wrote

I don't believe that an artificial neural network is morally or legally equivalent to a human. If I did believe that, then there would be more pressing issues than copyright infringement to deal with, such as corporate enslavement of AIs.

0

khamelean t1_jebrxm4 wrote

What does moral or legal equivalence to humans have to do with anything?

The point is that all AI has to do to learn from art is look at it. If someone makes their art free to look at, then it’s free for an AI to look at.

0

Space_Pirate_R t1_jebta1s wrote

AIs don't have agency. The AI is a tool which is being operated by a corporate entity. The corporate entity is governed by existing laws, and requires a license to use a copyright work in the operation of their business.

0

khamelean t1_jebxbzr wrote

So companies have to pay a licensing fee to every artist who’s work that employees of that company have ever looked at?? Yeah, I don’t think that’s how it works.

0

Space_Pirate_R t1_jebzo2e wrote

No, because (as I mentioned earlier) there is a fair use exemption which allows humans to be educated using copyright works. However, there is no such exemption allowing corporations to train AI using copyright works.

0

khamelean t1_jec1fmg wrote

Education is irrelevant in this context. The copyrighted works people consume through education is a tiny fraction of the total number of copyrighted works that most people experience through their lives. And all of those experiences contribute to that person’s capabilities.

The exemption for education’s purposes is for presenting copyright material to students in an education setting. It has nothing to do with copyright work that the student might seek out themselves.

0

Space_Pirate_R t1_jec5lal wrote

Yes, humans experience copyright works and learn from them, and that's fair use. What does that have to do with training an AI?

A person or corporation training an AI is covered by normal copyright law, which requires a license to use the work.

1

khamelean t1_jec837g wrote

How is it any different to an employee “using” the work? Corporations don’t pay licensing when an employee gets inspired by a movie they saw last night.

Why do you keep mentioning corporations? An AI could just as easily be trained by an individual. I’ve written and trained a few myself.

1

Space_Pirate_R t1_jec8j1p wrote

>Corporations don’t pay licensing when an employee gets inspired by a movie they saw last night.

The employee themselves paid to view the movie. The copyright owner set the amount of compensation knowing that the employee could retain and use the knowledge gained. No more compensation is due. This is nothing like a person or corporate entity using unlicensed copyright works to train an AI.

>Why do you keep mentioning corporations? An AI could just as easily be trained by an individual. I’ve written and trained a few myself.

Me too. I keep saying "person or corporation training an AI" to remind us that the law (and any moral judgement) applies to the person or corporate entity conducting the training, not to the AI per se, because the AI is merely a tool and is without agency of its own.

1

khamelean t1_jecbi7y wrote

“What does that have to do with a person or corporate entity training an ai?”

Training a human neural network is analogous to training an artificial neural network.

Whether the employee paid to watch a movie doesn’t matter, they could have just as easily watch something distributed for free. The transaction to consume the content is, as you said irrelevant to the corporation.

An AI consuming a copyright work is no different to a human consuming a copyright work. If that work is provided for free consumption, why would the owner of the AI have to pay for the AI to consume it?

1

Space_Pirate_R t1_jecfcfy wrote

>Training a human neural network is analogous to training an artificial neural network.

By definition, something analogous is similar but not the same. Lots of things are analogous to others, but that doesn't even remotely imply that they should be governed by the same laws and morality.

>An AI consuming a copyright work is no different to a human consuming a copyright work.

A human consuming food is no different to a dog consuming food. Yet we have vastly different laws governing human food compared to dog food. Dogs and AI are not humans, and that is the difference.

>If that work is provided for free consumption, why would the owner of the AI have to pay for the AI to consume it?

If that work is provided for free consumption, why would the owner of a building have to compensate the copyright owner to print a large high quality copy and hang it on a public wall in the lobby? The answer is that the person (not the AI) is deriving some benefit (beyond fair use) from their use of the copyrighted work, and therefore the copyright owner should be compensated.

1

khamelean t1_jecru6d wrote

The building owner is using a replication of the copyrighted work. The owner should absolutely compensate the original creator.

But the printing company that the building owner hires to print the poster doesn’t owe the original creator anything. Even though it is directly replicating copyrighted work, and certainly benefiting from doing so. If the printer were selling the copyrighted works directly then that would be a different matter and they would have to compensate the original copyright owner. So clearly context matters.

An AI doesn’t even make a replication of the original work as part of its training process.

If the AI then goes on to create a replication, or a new work that is similar enough to the original that copyright applied, and intended to use the work in a context where copyright would apply, then absolutely. That would constitute a breach of copyright.

It is the work itself that is copyrighted, not the knowledge/ability to create the work. It’s the knowledge of how to create the work which is encoded in the neural network.

Lots of people benefits from freely distributed content. Simply benefiting from it is not enough to justify requiring a license fee.

Hypothetically speaking, let’s say a few years down the line we have robot servants. I have a robotic care giver that assists me with mobility. Much as I may have a human care giver today.

If I go to the movies with my robot care giver, they will take up a seat so I would expect to pay for a ticket, just as I would for a human care giver. Do I then need to pay an extra licensing fee for the robots AI brain to actually watch the movie?

What if it’s a free screening? Should I still have to pay for the robot brain to “use” the movie?

Is the robot “using” the movie in some unique and distinct way compared to how I would be “using” the movie?

1

Newfondahloose t1_jeb3tmh wrote

It’s learning and using language to answer questions. There’s only so many ways you can answer the same question. Greed getting in the way of progress, as always. Guess professors should give a citation every time they give a verbal answer even though they are answering from memory.

0

Particular-Way-8669 t1_jebcdng wrote

There is difference between human that can be creative and using it for computer program that creates aggregations. Completely different thing. AI does not really learn. It adjusts its mathematical functions based on data.

1

khamelean t1_jebgej7 wrote

No, there is no difference. Creativity is just combination and random mutation. It’s how humans are creative, it’s how machines are creative. It’s the same thing.

1

Particular-Way-8669 t1_jebh4n3 wrote

This is utter bullshit. There was always some human that came up with something first. When there was nothing like that before. AI technology we know does not have this ability. And never will. It is only data aggregation, nothing else. Human does not need data from other humans to be creative and the very fact that there was someone who climbed off of trees and picked up first fire is proof of that.

1

khamelean t1_jebi03n wrote

Combination + mutation. It allowed evolution through natural selection to give us every life form on earth. Creativity works exactly the same way.

1

johndburger t1_jeby23l wrote

ChatGPT has “learned” some generalizations from the text that it’s processed, but it has also literally memorized (I.e. copied) billions of words from it.

1

Numai_theOnlyOne t1_jeb91dq wrote

Yeah and it makes sense as human but I can see this being an issue with AI and how fast it can learn.

After all suddenly whatever I posted anywhere is used to generate revenue which was formerly targeted towards people for free to get response for free. AI though usually requires you to pay for it. So why shouldn't the pay me to use my data? Sure maybe there is someone that made money with my response, and I might buy any of there stuff that's fine because it was not only because of my input unlike AI which only works because of the data. Same with artists. They were posting stuff for free not to be used for free but to present their art and land a job. You can't also not just rip an image from the internet and use it in a commercial because "it was freely available on the internet".

2

thurken t1_jeayknu wrote

We're talking about ethics here, not unethical legal loopholes

1

Newfondahloose t1_jeb4uk7 wrote

Ethics are different for everyone. I find it unethical to hold back society just because you want to be referenced or given 5 cents for your shitty, regurgitated blog post.

0

thurken t1_jeb60db wrote

That was kind of the opposite point. That OpenAI would have some nerves to be mad a google to use ChatGPT to generate training data when they used everyone's data to get training data.

2

Particular-Way-8669 t1_jebc3ih wrote

Everything free to access that is not licensed under copyright friendly IP is by definition IP of the one who put it out. Even if you take picture and put it on Facebook it is your IP. Facebook might have TOS that says they have right to do certain things you post on their site. Sure. But you gave then permission by agreeing to it. OpenAI never received any permission from anyone. Period.

1

beingsubmitted t1_jecici1 wrote

The algorithm is barely IP, and the data is the bigger part of it's success.

ChatGPT is a reinforcement learning tuned transformer. The ideas and architecture it's built on aren't proprietary. The specific parameters are, but that's not actually that important. The size and number of layers, for example. Most people in ai can make some assumptions. Probably ReLU, probably Adam, etc. Then there are different knobs you can twiddle and with some trial and error you dial it in.

The size and quality of your training data is way more important, and in the case of chatgpt, so is your compute power. Lots of people can design a system that big, it's as easy as it is to come up with big numbers, but training it takes a ton of compute power, which costs money, which is why just anyone hasn't already done it if it's so easy.

It should also be said that GPT is a bit of a surprise success. Before models this size, it was a big risk. You're gonna spend millions to train a model, and you won't know until it's done how good it will be.

Most advancements in AI are open source and public. Those all help advance the field, but at the same time, it's also about taking a bit of a risk, and waiting to see how it pans out before taking the next risk.

Also, there's transfer learning. If you spend a hundred million training a model, I can use your trained model and a fraction of the money to make my own .

It's like if you laboriously took painstaking measurements to figure out an exact kilogram and craft a 1kg weight. You didn't invent the kilogram, difficult as it was to make it. If I use yours to make my own, I'm not infringing on your IP.

1

DoobieBrotherhood t1_jeb2kru wrote

Because logic is out the window when it comes to hypocrisy nowadays. If you think we should limit GHG emissions, you can’t use any form of energy. If you think Russia was wrong to invade Ukraine and commit genocide, then you cannot be a citizen of any country that has ever been in a war.

1

bookko t1_jeba5ac wrote

it'd be the same use case as academic books, the knowledge is everywhere, dating back to Pythagoras but having it available in an usable manner is where the crux lies.

1

ThrillShow t1_jeak2on wrote

In a shocking twist, an AI was built on data taken without permission.

8

NLwino t1_jeamkpr wrote

In a shocking twist, everyone using a browser takes the same data with the same permissions.

−3

malmode t1_jeankdl wrote

In a shocking twist, there is nothing new under the sun.

3

Space_Pirate_R t1_jeb7vmw wrote

In a shocking twist, posting data on social media constitutes implied permission for other users to process it in their browsers in order to read it .

However, in a second shocking twist, posting doesn't constitute implied permission for corporations to train AI with the contents of posts.

3

ThrillShow t1_jebdn9g wrote

I'm shocked by how many people unquestioningly accept the idea that AI should be entitled to the same rights as humans, as if a machine that scrapes huge portions of the internet for content is exactly the same as one person browsing.

3

NLwino t1_jeblcee wrote

What do you think search engines need to do to give you the results?

0

Space_Pirate_R t1_jebrw1m wrote

People making copyright work available on the internet are granting an implied permission for search engines to index their work, because that's pursuant to the normal purposes of posting on the internet. People make work available on the internet for the purpose of allowing others to find it using search engines and view it using browsers.

However, making copyright work available on the internet does not constitute an implied permission or license to do literally anything with the posted work. People don't usually post work on the internet for the purpose of helping corporations train commercial AIs, and therefore no implied permission to do so is granted by the act of making copyright work available on the internet.

2

[deleted] t1_jeaea2l wrote

[deleted]

−1

AbeWasHereAgain t1_jeaxxpf wrote

lol - say you regularity violate use terms on open source software without saying you regularity violate use terms on open source software.

4

Newfondahloose t1_jeb59mb wrote

There’s only so many ways to make a computer say “Hello, World!”. Don’t want it copied? Don’t make it public.

0

AbeWasHereAgain t1_jeb5q6e wrote

ha ha ha - yeah, that's exactly what we are talking about here.

PS - that's exactly what OpenAI is complaining about here.

1

NetrunnerCardAccount t1_jead72l wrote

Chris Pappas is almost certainly wrong.

They both crawled the Internet and undoubtably have similarity in their data set.

17

MuForceShoelace t1_jeaux6b wrote

I would like to sue every AI that used my data in it's training. 1 million dollars per use.

8

Thorusss t1_jeavgqn wrote

Even if it is against the terms of service of ChatGPT, what are they going to do about it? There are no legal judgments if AI output even is copyrightable, and no judgments if training on copyrightable material is fair use.

And OpenAI trained on a lot of copyright material, so they better think twice about opening that can of worms.

They only thing they can try to do, is limit the access of Google to ChatGPT's output, but good luck with that, if they want it to remain available to the general public.

5

_zir_ t1_jeavn7w wrote

Sounds like a disgruntled worker who probably got laid off or fired if they think this matters.

2

Constantienus t1_jeamoc7 wrote

Wouldn't have been the first time. Same thing happened with google earth

1

Next_Boysenberry1414 t1_jeao2qn wrote

ShareGPT open web resource...

So this "open web" thing is just a name? WTF does open means?

1

Suberizu t1_jeb5du5 wrote

What the hell, Google, you really need to do that shit with your resources?

1

Sirisian t1_jeba6bn wrote

Rule 12, submit articles and sourced information.

1

AcceptableGood5105 t1_jebg8h9 wrote

They’d better worry about AI bots becoming so mature one day that they start violating humans and human society instead of copyrights

1