Submitted by MINE_exchange t3_126lfjd in Futurology
[removed]
Submitted by MINE_exchange t3_126lfjd in Futurology
[removed]
Are you referring to other peoples algorithms or the content used to train the Ai language model.
Not OP but I'd say they mean the content.
AI is trained using all the collective output of human beings on Reddit, Stackoverflow, etc.
We're not compensated for it, so why should anyone care about using the dataset they put together that is this content?
The content is public and free to access? The algorithm is ip. I could be wrong though.
Just because something is free to access doesn't mean you have the right to do whatever you want with it, especially with regards to making derivative works without attribution or otherwise breaking license terms. This is what licenses and copyrights are for.
For example, if OpenAI scraped a code repository that uses a Creative Commons NonCommercial license and is using that code for monetary gain without the owner's consent, they're breaking that license. It'd have to be argued whether the fact that OpenAI used that code to train their models which may generate code to similar likeness counts as distributing the source, and whether having a user use that model under a paid service counts as a commercial violation of those terms.
The algorithm is IP, yes. But GPT-X is part model part training data.
No doubt those licensing were ignored but without evidence how do you make that copyright claim? Without evidence does that make it derivative? What do you think?
I mean yeah I agree, enforcing something like this is going to be very very difficult. But there are several clear examples of something like DallE generating images very similar to or nearly identical to copyrighted IP.
IANAL, but I presume a claimant could be able to establish some reasonable certainty to a court that licensed works were used in a way that breaks the license, at which point OpenAI (or really any AI company) would be responsible for defending their practice or non-use of those licensed works
"Generating" literal smudged watermarks from copyrighted content.
You signed over your rights to your content . when you signed up to reddit, or facebook, or google.
It's not like OpenAI is using some shoestring budget web scrapper using python and the beautifulsoup library.
They have partnerships .. and requested the raw text data.
You signed off those rights away to these sites. Not to OpenAI lol. It is still your IP. You can not go and copy it because you posted it there because you Will be hit with infrigement law suit. Reddit, Facebook, Google received your permission to use it in certain way. And yes Google or Facebook can potentionally claim it used those data fairly for their models. OpenAI? Not a chance.
unfortunately your wrong:
The Services may contain information, text, links, graphics, photos, videos, audio, streams, or other materials (“Content”), including Content created with or submitted to the Services by you or through your Account (“Your Content”). We take no responsibility for and we do not expressly or implicitly endorse, support, or guarantee the completeness, truthfulness, accuracy, or reliability of any of Your Content.
By submitting Your Content to the Services, you represent and warrant that you have all rights, power, and authority necessary to grant the rights to Your Content contained within these Terms. Because you alone are responsible for Your Content, you may expose yourself to liability if you post or share Content without all necessary rights.
You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
Any ideas, suggestions, and feedback about Reddit or our Services that you provide to us are entirely voluntary, and you agree that Reddit may use such ideas, suggestions, and feedback without compensation or obligation to you.
Although we have no obligation to screen, edit, or monitor Your Content, we may, in our sole discretion, delete or remove Your Content at any time and for any reason, including for violating these Terms, violating our Content Policy, or if you otherwise create or are likely to create liability for us.
Why do you even bother copying something without reading it?
"You retrain any ownership rights..."
End of Story, I am right. It says exactly what I said it did. You grant Reddit (and only Reddit) rights to manipulate with your content as written in TOS. You do not grant it to anyone else. If Reddit partners with someone then they would also be included if Reddit gave them that right. But this is not what happened. OpenAI scrapped internet. There was no partnership with reddit or anyone whatsoever.
You retain ownership... but you more or less signed over all right in what they can do with said information... it right there in the highlighted text.
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit << this is the part that lets them hand it over to companies like OpenAI
Again if Reddit trained their own AI on user's data or gave that data to openAI as part of contract then you would have the point. But this is not what happened. OpenAI did not ask anyone. They run data crawling scripts and stole data without asking. It is nothing like what Reddit is doing. You did not sign anything off to OpenAI.
> this is the part that lets them hand it over to companies like OpenAI
Your claim is that OpenAI has negotiated usage rights from every single site it’s gotten data from? Do you have any evidence for this?
Just because something is free to access, doesn’t mean you are allowed to remember it or learn from it any way!!
Go ask Vanilla Ice what happens when your music sounds a little too close to the original.
OpenAI, and Microsoft, are 100% violating terms of use for the vast majority of the stuff they scraped.
All musicians learn from hearing other music.
There is a difference between learning and copying.
So if I call something in my process "learning" I'm free to use it? I'm learning copies of your works on my printer to sell right now
Edit: to be clear I don't know what the answer is but that seems simplistic
Tbh can we separate human learning with AI learning?
A human is a biological imperfect being that require time and repetition to learn.
AI needs just a large pool of data and can the same as millions of humans in a fracture of the time required.
I think that's not the same learning, and a thing that honestly should be questioned, after all our content was created with humans in mind and not meant to been used for ai.
The learning isn’t the problem, the selling is.
They are selling their own work. There’s only so many ways you can answer a question. Just because you’ve answered the question before, doesn’t mean someone else can’t come to the same conclusion when answering for themselves.
lol - you don't think ChatGPT is spitting out insanely close replicas of other peoples work daily?
Nothing wrong with playing/singing other people’s songs, I sing along to the radio in my car all the time.
ha ha ha - yeah, totally the same thing. Just an FYI, artists are required to pay when they do a cover.
Everything changes when you start making money off other peoples work.
That’s exactly my point.
What is your point?
It’s not a problem until you start making money off other peoples work.
Are monetized AI artists paying royalties to everyone whose art was scraped off the web?
Are human artist paying royalties to everyone who’s art they scraped off the web??
[deleted]
Human artists learning from others' work is obviously "fair use." I don't think a corporation will successfully deploy that in defense of training a commercial AI.
Just looking at a piece of art is enough to encode it into a human’s neural network. Why should it be any different for an artificial neural network? If it’s free to access then it’s free to access.
I don't believe that an artificial neural network is morally or legally equivalent to a human. If I did believe that, then there would be more pressing issues than copyright infringement to deal with, such as corporate enslavement of AIs.
What does moral or legal equivalence to humans have to do with anything?
The point is that all AI has to do to learn from art is look at it. If someone makes their art free to look at, then it’s free for an AI to look at.
AIs don't have agency. The AI is a tool which is being operated by a corporate entity. The corporate entity is governed by existing laws, and requires a license to use a copyright work in the operation of their business.
So companies have to pay a licensing fee to every artist who’s work that employees of that company have ever looked at?? Yeah, I don’t think that’s how it works.
[deleted]
No, because (as I mentioned earlier) there is a fair use exemption which allows humans to be educated using copyright works. However, there is no such exemption allowing corporations to train AI using copyright works.
Education is irrelevant in this context. The copyrighted works people consume through education is a tiny fraction of the total number of copyrighted works that most people experience through their lives. And all of those experiences contribute to that person’s capabilities.
The exemption for education’s purposes is for presenting copyright material to students in an education setting. It has nothing to do with copyright work that the student might seek out themselves.
Yes, humans experience copyright works and learn from them, and that's fair use. What does that have to do with training an AI?
A person or corporation training an AI is covered by normal copyright law, which requires a license to use the work.
How is it any different to an employee “using” the work? Corporations don’t pay licensing when an employee gets inspired by a movie they saw last night.
Why do you keep mentioning corporations? An AI could just as easily be trained by an individual. I’ve written and trained a few myself.
>Corporations don’t pay licensing when an employee gets inspired by a movie they saw last night.
The employee themselves paid to view the movie. The copyright owner set the amount of compensation knowing that the employee could retain and use the knowledge gained. No more compensation is due. This is nothing like a person or corporate entity using unlicensed copyright works to train an AI.
>Why do you keep mentioning corporations? An AI could just as easily be trained by an individual. I’ve written and trained a few myself.
Me too. I keep saying "person or corporation training an AI" to remind us that the law (and any moral judgement) applies to the person or corporate entity conducting the training, not to the AI per se, because the AI is merely a tool and is without agency of its own.
“What does that have to do with a person or corporate entity training an ai?”
Training a human neural network is analogous to training an artificial neural network.
Whether the employee paid to watch a movie doesn’t matter, they could have just as easily watch something distributed for free. The transaction to consume the content is, as you said irrelevant to the corporation.
An AI consuming a copyright work is no different to a human consuming a copyright work. If that work is provided for free consumption, why would the owner of the AI have to pay for the AI to consume it?
[deleted]
>Training a human neural network is analogous to training an artificial neural network.
By definition, something analogous is similar but not the same. Lots of things are analogous to others, but that doesn't even remotely imply that they should be governed by the same laws and morality.
>An AI consuming a copyright work is no different to a human consuming a copyright work.
A human consuming food is no different to a dog consuming food. Yet we have vastly different laws governing human food compared to dog food. Dogs and AI are not humans, and that is the difference.
>If that work is provided for free consumption, why would the owner of the AI have to pay for the AI to consume it?
If that work is provided for free consumption, why would the owner of a building have to compensate the copyright owner to print a large high quality copy and hang it on a public wall in the lobby? The answer is that the person (not the AI) is deriving some benefit (beyond fair use) from their use of the copyrighted work, and therefore the copyright owner should be compensated.
The building owner is using a replication of the copyrighted work. The owner should absolutely compensate the original creator.
But the printing company that the building owner hires to print the poster doesn’t owe the original creator anything. Even though it is directly replicating copyrighted work, and certainly benefiting from doing so. If the printer were selling the copyrighted works directly then that would be a different matter and they would have to compensate the original copyright owner. So clearly context matters.
An AI doesn’t even make a replication of the original work as part of its training process.
If the AI then goes on to create a replication, or a new work that is similar enough to the original that copyright applied, and intended to use the work in a context where copyright would apply, then absolutely. That would constitute a breach of copyright.
It is the work itself that is copyrighted, not the knowledge/ability to create the work. It’s the knowledge of how to create the work which is encoded in the neural network.
Lots of people benefits from freely distributed content. Simply benefiting from it is not enough to justify requiring a license fee.
Hypothetically speaking, let’s say a few years down the line we have robot servants. I have a robotic care giver that assists me with mobility. Much as I may have a human care giver today.
If I go to the movies with my robot care giver, they will take up a seat so I would expect to pay for a ticket, just as I would for a human care giver. Do I then need to pay an extra licensing fee for the robots AI brain to actually watch the movie?
What if it’s a free screening? Should I still have to pay for the robot brain to “use” the movie?
Is the robot “using” the movie in some unique and distinct way compared to how I would be “using” the movie?
It’s learning and using language to answer questions. There’s only so many ways you can answer the same question. Greed getting in the way of progress, as always. Guess professors should give a citation every time they give a verbal answer even though they are answering from memory.
There is difference between human that can be creative and using it for computer program that creates aggregations. Completely different thing. AI does not really learn. It adjusts its mathematical functions based on data.
No, there is no difference. Creativity is just combination and random mutation. It’s how humans are creative, it’s how machines are creative. It’s the same thing.
This is utter bullshit. There was always some human that came up with something first. When there was nothing like that before. AI technology we know does not have this ability. And never will. It is only data aggregation, nothing else. Human does not need data from other humans to be creative and the very fact that there was someone who climbed off of trees and picked up first fire is proof of that.
Combination + mutation. It allowed evolution through natural selection to give us every life form on earth. Creativity works exactly the same way.
ChatGPT has “learned” some generalizations from the text that it’s processed, but it has also literally memorized (I.e. copied) billions of words from it.
Technically it remembers the relationships between words, those relationships are encoded in its neural network. It doesn’t just copy the text.
https://en.m.wikipedia.org/wiki/Transformer_(machine_learning_model)
Yeah and it makes sense as human but I can see this being an issue with AI and how fast it can learn.
After all suddenly whatever I posted anywhere is used to generate revenue which was formerly targeted towards people for free to get response for free. AI though usually requires you to pay for it. So why shouldn't the pay me to use my data? Sure maybe there is someone that made money with my response, and I might buy any of there stuff that's fine because it was not only because of my input unlike AI which only works because of the data. Same with artists. They were posting stuff for free not to be used for free but to present their art and land a job. You can't also not just rip an image from the internet and use it in a commercial because "it was freely available on the internet".
We're talking about ethics here, not unethical legal loopholes
Ethics are different for everyone. I find it unethical to hold back society just because you want to be referenced or given 5 cents for your shitty, regurgitated blog post.
That was kind of the opposite point. That OpenAI would have some nerves to be mad a google to use ChatGPT to generate training data when they used everyone's data to get training data.
Everything free to access that is not licensed under copyright friendly IP is by definition IP of the one who put it out. Even if you take picture and put it on Facebook it is your IP. Facebook might have TOS that says they have right to do certain things you post on their site. Sure. But you gave then permission by agreeing to it. OpenAI never received any permission from anyone. Period.
The algorithm is barely IP, and the data is the bigger part of it's success.
ChatGPT is a reinforcement learning tuned transformer. The ideas and architecture it's built on aren't proprietary. The specific parameters are, but that's not actually that important. The size and number of layers, for example. Most people in ai can make some assumptions. Probably ReLU, probably Adam, etc. Then there are different knobs you can twiddle and with some trial and error you dial it in.
The size and quality of your training data is way more important, and in the case of chatgpt, so is your compute power. Lots of people can design a system that big, it's as easy as it is to come up with big numbers, but training it takes a ton of compute power, which costs money, which is why just anyone hasn't already done it if it's so easy.
It should also be said that GPT is a bit of a surprise success. Before models this size, it was a big risk. You're gonna spend millions to train a model, and you won't know until it's done how good it will be.
Most advancements in AI are open source and public. Those all help advance the field, but at the same time, it's also about taking a bit of a risk, and waiting to see how it pans out before taking the next risk.
Also, there's transfer learning. If you spend a hundred million training a model, I can use your trained model and a fraction of the money to make my own .
It's like if you laboriously took painstaking measurements to figure out an exact kilogram and craft a 1kg weight. You didn't invent the kilogram, difficult as it was to make it. If I use yours to make my own, I'm not infringing on your IP.
But that's not what this post is about at all? What?
Because logic is out the window when it comes to hypocrisy nowadays. If you think we should limit GHG emissions, you can’t use any form of energy. If you think Russia was wrong to invade Ukraine and commit genocide, then you cannot be a citizen of any country that has ever been in a war.
it'd be the same use case as academic books, the knowledge is everywhere, dating back to Pythagoras but having it available in an usable manner is where the crux lies.
[removed]
Can you link me where OpenAI got all up in arms.
In a shocking twist, an AI was built on data taken without permission.
In a shocking twist, everyone using a browser takes the same data with the same permissions.
In a shocking twist, there is nothing new under the sun.
In a shocking twist, posting data on social media constitutes implied permission for other users to process it in their browsers in order to read it .
However, in a second shocking twist, posting doesn't constitute implied permission for corporations to train AI with the contents of posts.
I'm shocked by how many people unquestioningly accept the idea that AI should be entitled to the same rights as humans, as if a machine that scrapes huge portions of the internet for content is exactly the same as one person browsing.
What do you think search engines need to do to give you the results?
People making copyright work available on the internet are granting an implied permission for search engines to index their work, because that's pursuant to the normal purposes of posting on the internet. People make work available on the internet for the purpose of allowing others to find it using search engines and view it using browsers.
However, making copyright work available on the internet does not constitute an implied permission or license to do literally anything with the posted work. People don't usually post work on the internet for the purpose of helping corporations train commercial AIs, and therefore no implied permission to do so is granted by the act of making copyright work available on the internet.
[deleted]
[deleted]
lol - say you regularity violate use terms on open source software without saying you regularity violate use terms on open source software.
There’s only so many ways to make a computer say “Hello, World!”. Don’t want it copied? Don’t make it public.
ha ha ha - yeah, that's exactly what we are talking about here.
PS - that's exactly what OpenAI is complaining about here.
Chris Pappas is almost certainly wrong.
They both crawled the Internet and undoubtably have similarity in their data set.
I would like to sue every AI that used my data in it's training. 1 million dollars per use.
Even if it is against the terms of service of ChatGPT, what are they going to do about it? There are no legal judgments if AI output even is copyrightable, and no judgments if training on copyrightable material is fair use.
And OpenAI trained on a lot of copyright material, so they better think twice about opening that can of worms.
They only thing they can try to do, is limit the access of Google to ChatGPT's output, but good luck with that, if they want it to remain available to the general public.
Sounds like a disgruntled worker who probably got laid off or fired if they think this matters.
[removed]
[removed]
Wouldn't have been the first time. Same thing happened with google earth
ShareGPT open web resource...
So this "open web" thing is just a name? WTF does open means?
"OpenAI" is the next "Valve can't count to 3" meme
[removed]
What the hell, Google, you really need to do that shit with your resources?
Rule 12, submit articles and sourced information.
They’d better worry about AI bots becoming so mature one day that they start violating humans and human society instead of copyrights
AbeWasHereAgain t1_je9mpyf wrote
I love that OpenAI uses a ton of other peoples work to train their model, yet when someone uses OpenAI to train their model, they get all up in arms.
As far as I'm concerned, OpenAI has decided terms of use don't exist anymore.