xtime595 t1_j7hyhbw wrote on February 6, 2023 at 10:42 PM

Ah yes, I love it when corporations decide to halt scientific progress

_poisonedrationality t1_j7i5r45 wrote on February 6, 2023 at 11:31 PM

You shouldn't confuse "scientific progress" with "commercial gain". I know a lot of companies in AI blur the line but I think that researchers, who don't seek to make a profit aren't really the same as something like Stability AI, who are trying to sell a product.

Besides, it's not clear to me whether these AI tools be used to benefit humanity as a whole or only increase the control a few companies have over large markets. I really hope this case sets ome decent precedents about how AI developers can use data they did not create.

EmbarrassedHelp t1_j7klnqy wrote on February 7, 2023 at 2:05 PM

If Getty Images wins, then AI generation tools are going to become further concentrated to a handful of companies while also becoming less open.

HateRedditCantQuitit t1_j7nbrkz wrote on February 8, 2023 at 1:04 AM

Not necessarily. If it turns out, for example, that language generation models trained on GPL code must be GPL, then it means that there's a possible path to more open models, if content creators continue creating copyleft content ecosystems.

currentscurrents t1_j7ioshb wrote on February 7, 2023 at 1:53 AM

> Besides, it's not clear to me whether these AI tools be used to benefit humanity as a whole

Of course they benefit humanity as a whole.

Language models allow computers to understand complex ideas expressed in plain english.
Automating art production will make custom art/comics/movies cheap and readily available.
ChatGPT-style AIs (if they can fix hallucination/accuracy problems) give you an oracle with all the knowledge of the internet.
They're getting less hype right now, but there's big advances in computer vision (CNNs/Vision Transformers) that are revolutionizing robotics and image processing.

>I really hope this case sets ome decent precedents about how AI developers can use data they did not create.

You didn't create the data you used to train your brain, much of which was copyrighted. I see no reason why we should put that restriction on people trying to create artificial brains.

[deleted] t1_j7j2gb9 wrote on February 7, 2023 at 3:39 AM

[deleted]

e_for_oil-er t1_j7kh4m0 wrote on February 7, 2023 at 1:27 PM

Major corporations using ML to generate images instead of hiring artists purely in the goal of increasing their profits. Helping to make the richest guy to get even more rich. How does that help humanity?

VeritaSimulacra t1_j7ib4yu wrote on February 7, 2023 at 12:09 AM

TIL science cannot progress without training ML models on Getty images

currentscurrents t1_j7innd5 wrote on February 7, 2023 at 1:44 AM

Getty is just the test case for the question of copyright and AI.

If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness. It also seems distinctly unfair, since copyright is only supposed to protect the specific arrangement of words or pixels, not the information they contain or the artistic style they're in.

The big tech companies can afford to license content from Getty, but us little guys can't. If they win it will effectively kill open-source AI.

trias10 t1_j7iv9iy wrote on February 7, 2023 at 2:43 AM

Data is incredibly valuable, OpenAI and Facebook have proven that. Ever bigger models require ever more data. And we live in a capitalist world, so if something is valuable, like data, you typically have to pay for it. So open source AI shouldn't be a thing.

Also, OpenAI is hardly open source anymore. They no longer disclose their data sources, data harvesting, data methodologies, nor release their training code. They also don't release their trained models anymore.

If they were truly open source, I could see maybe defending them, but at the moment all I see is a company violating data privacy and licences to get incredibly rich.

[deleted] t1_j7iwbn0 wrote on February 7, 2023 at 2:51 AM

[removed]

HateRedditCantQuitit t1_j7l1m4c wrote on February 7, 2023 at 3:59 PM

>If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness.

That would be great. It could lead to a future with things like copyleft data, where if you want to train on open stuff, your model legally *must* be open.

ninjasaid13 t1_j7irgdv wrote on February 7, 2023 at 2:14 AM

I think he meant more about open source being threatened.

hgoel0974 t1_j7l1meo wrote on February 7, 2023 at 3:59 PM

You seem like the type to argue that any ethics related restrictions on science are bad.

superluminary t1_j7ldt9p wrote on February 7, 2023 at 5:18 PM

If the US doesn’t allow it then China is just going to pick this up and run with it. These things are technically possible to do now. The US can either be at the front, leading the AI revolution, or can dip out and let other countries pick it up. Either way it’s happening.

klop2031 t1_j7hyv7h wrote on February 6, 2023 at 10:44 PM

Nah, thats not good. Who cares about getty... no one... let science move forward

MisterBadger t1_j7iex5o wrote on February 7, 2023 at 12:37 AM

If "science" can't move to the next base without getting enthusiastic consent from the other parties it hopes to involve, then "science" should damn well keep its mitts to itself. In the case of OpenAI, "science" got handsy with people's personal stuff who were unaware of what was going on, and who would not have given consent if they had known. OpenAI's approach to science is creepy, unethical, and messed up.

klop2031 t1_j7qcak1 wrote on February 8, 2023 at 5:41 PM

Take a gander here: https://youtu.be/G08hY8dSrUY At min 8 and 9 sec Seems like no one knows how scotus will deal with it but a good argument is that an AI is experiencing are like humans and generates new work by mixing in its skill.

Further, it seems like the law may only differentiate it by the intelligences' physical makeup.

And to be honest, it seems like the only ppl mad about generative networks producing art are the artists about to lose their jobs.

Who cares if an AI can create art, if one only cares about the creative aspect then the human can make art too, no one is stopping them. But really its about money.

MisterBadger t1_j7rj1bf wrote on February 8, 2023 at 10:08 PM

Machine learning algorithms are not even "intelligent" enough to filter out Getty watermarks.

They do not have minds or experiences, any more than zbrush or cinema4D or any other complicated software do.

Furthermore, they do not produce outputs like humans do - the speed and scale are more akin to automated car factories than human tinkers.

Fair use laws were not designed with them in mind.

trias10 t1_j7i0pq3 wrote on February 6, 2023 at 10:57 PM

I personally don't care a whit about Stable Diffusion. AI should be going after rote, boring tasks via automation, not creativity and art. That's the one thing actually enjoyable about life, and the last thing we should be automating with stupid models that are just scaled matrix multiplication.

ninjasaid13 t1_j7irv62 wrote on February 7, 2023 at 2:17 AM

>. That's the one thing actually enjoyable about life

Opinion.

MelonFace t1_j7khppc wrote on February 7, 2023 at 1:32 PM

As if the rest of this whole thread isn't opinions, or opinions acting like facts about law that has not yet been explored.

FoveatedRendering t1_j7mlqeu wrote on February 7, 2023 at 10:00 PM

If it's so enjoyable, all the more reason to automate it to get a lot more and increasingly better art.

Everyone enjoys art, 0.0001% can make it. AI will make the 99.9999% of people who enjoy art have more options and give superpowers to the previous 0.0001%(and any creator) to make more art.

VeritaSimulacra t1_j7ibe8b wrote on February 7, 2023 at 12:11 AM

Hopefully. Stability winning would decrease the openness of the internet. I already know software projects that aren’t being open sourced to avoid being part of training data, I’m sure artists will be much less likely to openly share as well.

trias10 t1_j7ibyhq wrote on February 7, 2023 at 12:15 AM

As it should be. If openness of internet means a few people become rich off the back of training on large swathes of data without explicit permission, then it should be stopped.

OpenAI should pay for their own labelled datasets, not harvest from the internet without explicit permission, to then sell back as GPT3 and get rich off of. This absolutely has to be punished and stopped.

VeritaSimulacra t1_j7icqmp wrote on February 7, 2023 at 12:21 AM

I agree with the goal, but I don’t think making the internet more closed is the way to go. The purpose of the internet is to be open. Making everything on the internet cost something would have a lot of negative effects on it. The solution to the powerful exploiting our openness isn’t to make it closed, but to regulate their usage of it.

trias10 t1_j7ifhdq wrote on February 7, 2023 at 12:41 AM

I agree, hence I support this lawsuit and hope that Getty wins, which I hope leads to some laws vastly curtailing which data AI can be trained on, especially when that data comes from artists/creators, who are already some of the lowest paid members of society (unless they're the lucky 0.01% of that group).

[deleted] t1_j7ifnao wrote on February 7, 2023 at 12:42 AM

[deleted]

currentscurrents t1_j7ipcip wrote on February 7, 2023 at 1:58 AM

OpenAI is doing a good thing. They've found a new and awesome way to use data from the open web, and they deserve their reward.

Getty's business model is outdated now, and the legal system shouldn't protect old industries from new inventions. Why search for a stock image that sorta kinda looks like what you want, when you could generate one that matches your exact specifications for free?

trias10 t1_j7irgui wrote on February 7, 2023 at 2:14 AM

What good thing is OpenAI doing exactly? I have yet to see any of their technologies being used for any sort of societal good. So far the only thing I have seen is cheating on homeworks and exams, faking legal documents, and serving as a dungeon master for D&D. The last one is kind of cool, but the first two are illegal.

Additionally, if you work in any kind of serious research division at a FAANG, you'd know there is a collective suspicion of OpenAI's work, as their recent papers (or lack thereof for ChatGPT) no longer describe the exact and specific data they used (beyond saying The Internet) and they no longer release their training code, making independent peer review and verification impossible, and causing many to question if their data is legally obtained. At any FAANG, you need to rope Legal into any discussion about data sources long before you begin training, and most data you see on the internet isn't actually usable unless there is an explicit licence allowing it, so a lot of data is off limits, but OpenAI seems to ignore that, hence they never discuss their data specifics anymore.

We live in a world of laws and multiple social contracts, you can't just do as you feel. Hopefully OpenAI is punished and restricted accordingly, and starts playing by the same rules as everyone else in the industry. Fanboys such as yourself aren't helpful to the progress of responsible, legal, and ethical AI research.

currentscurrents t1_j7ivm0a wrote on February 7, 2023 at 2:46 AM

>the only thing I have seen is cheating on homeworks and exams, faking legal documents, and serving as a dungeon master for D&D. The last one is kind of cool, but the first two are illegal.

Well that's just cherry-picking. LLMs could do very socially-good things like act as an oracle for all internet knowledge or automate millions of jobs. (assuming they can get the accuracy issues worked out - which there are tons of researchers trying to do, some of whom are even on this sub)

By far the most promising use is allowing computers to understand and express complex ideas in plain english. We're already seeing uses of this, for example text-to-image generators use a language model to understand prompts and guide the generation process. Or how Github Copilit can turn instructions from english into implementations in code.

I expect we'll see them applied to many more applications in the years to come, especially once desktop computers get fast enough to run them locally.

>starts playing by the same rules as everyone else in the industry.

Everyone else in the industry is also training on copyrighted data, because there is no source of uncopyrighted data big enough to train these models.

Also, your brain is updating its weights based on the copyrighted data in my comment right now, and that doesn't violate my copyright. Why should AI be any different?

[deleted] t1_j7j0wfo wrote on February 7, 2023 at 3:26 AM