Submitted by Wiskkey t3_10vg97m in MachineLearning

From the article:

>Getty Images has filed a lawsuit in the US against Stability AI, creators of open-source AI art generator Stable Diffusion, escalating its legal battle against the firm.
>
>The stock photography company is accusing Stability AI of “brazen infringement of Getty Images’ intellectual property on a staggering scale.” It claims that Stability AI copied more than 12 million images from its database “without permission ... or compensation ... as part of its efforts to build a competing business,” and that the startup has infringed on both the company’s copyright and trademark protections.

This is different from the UK-based news from weeks ago.

125

Comments

You must log in or register to comment.

MelonFace t1_j7hd5sm wrote

Us system at the same time as UK is an interesting move.

I'm not a lawyer but I'm wondering if this is a means of overwhelming SDs legal capacity.

40

EmmyNoetherRing t1_j7ia3n1 wrote

It also doesn’t give SD a chance to apply any lessons learned from the first case to the second one, I guess.

27

IndustryNext7456 t1_j7jb1ey wrote

Pot calling the kettle black. A company known for appropriating images not belonging to them, suing another...

39

Jrowe47 t1_j7js4yq wrote

Well no, that would imply some form of theft or infringement is being done by StableDiffusion. More like the pot accusing the apple of witchcraft. Getty is on the way out, this is pure corporate desperation.

16

FyreMael t1_j7iqqhb wrote

Getty is a blatant copyright infringer themselves.

Also, LAION gathered the images via crowdsourcing. I participated.

Y'all need to brush up on the law.

For the US, here is the current legal code for copyright law: https://www.law.cornell.edu/uscode/text/17

30

currentscurrents t1_j7iwvii wrote

> Also, LAION gathered the images via crowdsourcing. I participated.

I don't think the data collection methodology is really relevant. However the dataset was gathered, there are certainly ways to use it that would violate copyright. You couldn't print it in a book for example.

The important question is if training a generative AI on copyrighted data is a violation of copyright. US copyright law doesn't address this because AI didn't exist when it was written. It will be up to the courts to decide how this new application interacts with the law.

41

ThrillHouseofMirth t1_j7j0s9c wrote

The assertion that they're a "competing business" is going to be very hard to convince a judge of.

11

Scimmia8 t1_j7j83t0 wrote

Why? A lot of websites are already starting to use ai generated images rather than stock photos as headers for their articles. They would have previously paid companies like Getty for these.

22

jobeta t1_j7jda4k wrote

It’s clearly the case already. Shutterstock sold pictures to open-AI to create Dalle-2. Which will soon be used to create what used to be stock photography. This example here is ridiculously bad tho 🤣

3

ThrillHouseofMirth t1_j7q0053 wrote

The dalle2 thing is an interesting wrinkle but a judge will need to be convinced that a photograph and generated-image that looks like a photograph are the same thing. They aren’t imo but it doesn’t matter what I think.

1

[deleted] t1_j7itrzw wrote

[deleted]

5

currentscurrents t1_j7iy068 wrote

The exception Google Images got is pretty narrow and only applies to their role as a search engine. Fair use is complex, depends on a lot of case law, and involves balancing several factors.

One of the factors is "whether your use deprives the copyright owner of income or undermines a new or potential market for the copyrighted work." Google Image thumbnails clearly don't compete with the original work, but generative AI arguably does - the fact that it could automate art production is one of the coolest things about it.

That said, this is only one of several factors, so it's not a slam dunk for Getty either. The most important factor is how much you borrow from the original work. AI image generators borrow only abstract concepts like style, while Google was reproducing thumbnails of entire works.

Anybody who thinks they know how the courts will rule on this is lying to themselves.

22

clueless1245 t1_j7iz62v wrote

Got it, thanks! Yeah, I guess it's more complicated than I thought.

1

HateRedditCantQuitit t1_j7l30f2 wrote

I hate getty as much as anyone, but I'm going to go against the grain and hope they win this. Imagine if instead of getty vs stability, it was artstation vs facebook or something. The same legal principles must apply.

In my ideal future, we'd have things like

- research use is free, but commercial use requires opt-in consent from content creators

- the community adopts open licenses like e.g. copyleft (if you use a GPL9000 dataset, the model must be GPL too, or whatever) or some other widely used opt-in license.

5

JustOneAvailableName t1_j7le6dw wrote

> but commercial use requires opt-in consent from content creators

You might as well ban it directly for commercial use with opt in

7

TaXxER t1_j7ojt22 wrote

As much as I like ML, it’s hard to argue that training ML models on data without consent, let alone even copyrighted data, would somehow be OK.

3

JustOneAvailableName t1_j7oknmi wrote

Copyright is about redistribution and we're talking pubicly available data. I don't want/need to give consent to specific people/companies to allow them to read this comment. Nor do I think it should now be up to reddit to decide what is and isn't allowed

3

TaXxER t1_j7omop6 wrote

Generative models do redistribute though, often outputting near copies:

https://openaccess.thecvf.com/content/WACV2021/papers/Tinsley_This_Face_Does_Not_Exist..._But_It_Might_Be_Yours_WACV_2021_paper.pdf

https://arxiv.org/pdf/2203.07618.pdf

Copyright does not only cover republishing, but also covers derived work. I think it is a very reasonable position to consider all generative model output o for which some training set image Xi had a particularly large influence on o, to be derived work from Xi.

Similar story holds true for code generation models and software licensing: copilot was trained on lots of software repos that had software licenses that require all derived work to be licensed under an at least equally permissive license. Copilot may very well output a specific code snippets particularly based on what it has seen in a particular repo, thereby potentially opening up the user to the obligation to the licensing constraints that come with deriving work from that repo.

I’m an applied industry ML researcher myself, and am very enthousiastic about the technology and state of ML. But I also think that as a field as a whole we have unfortunately been careless about ethical and legal aspects.

2

scottyLogJobs t1_j7llrrl wrote

Why? Compare the top two images. It is a demonstration that they trained on Getty images but there’s no way anyone could argue that the nightmare fuel on the right deprives Getty of any money. Do you remember when Getty sued Google images and won? Sure Google is powerful and makes plenty of money, but now image search is way worse for consumers than it was a decade ago- you can’t just open the image or even a link to the image, you have to follow it back to their page and dig around for it, probably never finding it at all. Ridiculous that effectively embedding a link isn’t considered fair use, you’d still need to pay to use a Getty image 🤷‍♂️

Setting aside the fact that Getty is super hypocritical and constantly violates copyright law, and then effectively uses their litigators to push around smaller groups, if they win it’s just going to be another step that means only the big companies have access to data, making it impossible for smaller players to compete.

People fighting against technological advancement and innovation are always on the wrong side of history. There will always be a need for physical artists, digital artists, photographers, etc, because the value of art is already incredibly subjective, the value is generated by the artist, not the art, and client needs are so specific, detailed and iterative that an AI can’t achieve them.

Instead of seeing this tool as an opportunity for artists, they fight hopelessly against innovation and throw their lot in with huge bully companies like Getty Images.

4

jarkkowork t1_j7jmx5v wrote

Is it a copyright infringement for search services to cache (parts of) crawled webpages or to e.g. summarize their content or to produce a kind of "feature vector" of said webpages for business-related utilization?

3

Cherubin0 t1_j7kfvps wrote

There are enough open culture images out there, images from greedy corporations are not needed anymore.

1

icouldusemorecoffee t1_j7jq3h2 wrote

Will never hold up. Getty Images can be viewed publicly, that's what SD was doing, it was viewing the public images that Getty put out there and then generating private data on those image views. It's not a lot different than me looking at 100 images on Gerry and creating an Excel chart listing the color variations I saw.

−2

red_dragon t1_j7jufme wrote

Viewing and scraping for derived works are different things.

4

[deleted] t1_j7hjm4p wrote

[deleted]

−20

PHEEEEELLLLLEEEEP t1_j7hm3td wrote

Top legal mind of reddit

41

[deleted] t1_j7hyr6a wrote

[deleted]

−14

MisterBadger t1_j7idrz1 wrote

>Humans do take inspiration from others' work...

Ugh. This justification is creaky and useless.

Machines take instructions, and have zero inspiration.

Human artists aren't an endless chain of automated digital art factories producing mountains of art "by_Original_Artist".

One unimaginative guy copycatting another more imaginative artist is not going to be able to flood the market overnight with thousands of images that substantially replace the original media creator.

1

Centurion902 t1_j7ii94j wrote

This doesn't even mean anything unless you define inspiration.

4

MisterBadger t1_j7jd47m wrote

Nothing means anything if you're unfamiliar with the commonly understood meaning of words.

The dictionary definition of "inspiration":

>the process of being mentally stimulated to do or feel something, especially to do something creative.

Diffusion models are not, and do not have minds.

3

Centurion902 t1_j7jqu3d wrote

I see nothing about minds in thay definition.

−2

MisterBadger t1_j7jyejy wrote

Is English a second language for you?

Mentally (adverb) - in a manner relating to the mind.

1

tsujiku t1_j7kj8y0 wrote

Is a "mind" a blob of flesh or is it the combination of chemical interactions that happen in that blob of flesh.

Could a perfect simulation of those chemical interactions be considered a "mind?"

What about a slightly simplified model?

How far down that path do you have to go before it's no longer considered a "mind?"

You act like there are obvious answers to these questions, but I don't think you would have much luck if you had to get everyone to agree with you.

−2

MisterBadger t1_j7kjls1 wrote

Y'all need to stop stretching definitions of words past the breaking point.

I am not "acting like" anything. I simply understand the vast difference between a human brain and a highly specialized machine learning algorithm.

Diffusion models are not minds and do not have them.

You only need a very basic understanding of machine learning VS human cognition to be aware of this.

AI =|= Actual Intelligence;

Stable Diffusion =|= Sentient Device.

4

GusPlus t1_j7i4c70 wrote

I feel like the fact that the AI produces images with the Getty Images watermark is pretty decent proof that it copied images.

7

Ne_Nel t1_j7i4u96 wrote

Thats not much smarter than that comment tbh.

2

GusPlus t1_j7i5lt3 wrote

I’d like to know how it was trained to produce GI watermark without copying GI images for training data.

5

Ne_Nel t1_j7i67yp wrote

What are you talking about? The dataset is open source and there are thousands of Getty images. That isn't the discussion here.

3

orbital_lemon t1_j7idllq wrote

It saw stock photo watermarks millions of times during training. Nothing else in the training data comes even close. Even at half a bit per training image, that can add up to memorization of a shape.

Apart from the handful of known cases involving images that are duplicated many times in the training data, actual image content can't be reconstructed the same way.

2

pm_me_your_pay_slips t1_j7l6icx wrote

note that the VQ-VAE part of the SD model alone can encode and decode arbitrary natural/human-made images pretty well with very little artifacts. The diffusion model part of SD is learning a distribution of images in that encoded space.

1

orbital_lemon t1_j7lel1d wrote

The diffusion model weights are the part at issue, no? The question is whether you can squeeze infringing content out of the weights to feed to the vae.

1

f10101 t1_j7is2sw wrote

They undeniably did copy them for training, which is the allegation. Not even Stability would deny that.

The question is whether doing that is legal. Plain reading of the US law suggests it is legal to me, but Getty will argue otherwise.

7

trias10 t1_j7huwpk wrote

Good, hopefully Getty wins.

−40

xtime595 t1_j7hyhbw wrote

Ah yes, I love it when corporations decide to halt scientific progress

21

_poisonedrationality t1_j7i5r45 wrote

You shouldn't confuse "scientific progress" with "commercial gain". I know a lot of companies in AI blur the line but I think that researchers, who don't seek to make a profit aren't really the same as something like Stability AI, who are trying to sell a product.

Besides, it's not clear to me whether these AI tools be used to benefit humanity as a whole or only increase the control a few companies have over large markets. I really hope this case sets ome decent precedents about how AI developers can use data they did not create.

4

EmbarrassedHelp t1_j7klnqy wrote

If Getty Images wins, then AI generation tools are going to become further concentrated to a handful of companies while also becoming less open.

8

HateRedditCantQuitit t1_j7nbrkz wrote

Not necessarily. If it turns out, for example, that language generation models trained on GPL code must be GPL, then it means that there's a possible path to more open models, if content creators continue creating copyleft content ecosystems.

1

currentscurrents t1_j7ioshb wrote

> Besides, it's not clear to me whether these AI tools be used to benefit humanity as a whole

Of course they benefit humanity as a whole.

  • Language models allow computers to understand complex ideas expressed in plain english.
  • Automating art production will make custom art/comics/movies cheap and readily available.
  • ChatGPT-style AIs (if they can fix hallucination/accuracy problems) give you an oracle with all the knowledge of the internet.
  • They're getting less hype right now, but there's big advances in computer vision (CNNs/Vision Transformers) that are revolutionizing robotics and image processing.

>I really hope this case sets ome decent precedents about how AI developers can use data they did not create.

You didn't create the data you used to train your brain, much of which was copyrighted. I see no reason why we should put that restriction on people trying to create artificial brains.

4

e_for_oil-er t1_j7kh4m0 wrote

Major corporations using ML to generate images instead of hiring artists purely in the goal of increasing their profits. Helping to make the richest guy to get even more rich. How does that help humanity?

0

VeritaSimulacra t1_j7ib4yu wrote

TIL science cannot progress without training ML models on Getty images

2

currentscurrents t1_j7innd5 wrote

Getty is just the test case for the question of copyright and AI.

If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness. It also seems distinctly unfair, since copyright is only supposed to protect the specific arrangement of words or pixels, not the information they contain or the artistic style they're in.

The big tech companies can afford to license content from Getty, but us little guys can't. If they win it will effectively kill open-source AI.

16

trias10 t1_j7iv9iy wrote

Data is incredibly valuable, OpenAI and Facebook have proven that. Ever bigger models require ever more data. And we live in a capitalist world, so if something is valuable, like data, you typically have to pay for it. So open source AI shouldn't be a thing.

Also, OpenAI is hardly open source anymore. They no longer disclose their data sources, data harvesting, data methodologies, nor release their training code. They also don't release their trained models anymore.

If they were truly open source, I could see maybe defending them, but at the moment all I see is a company violating data privacy and licences to get incredibly rich.

1

HateRedditCantQuitit t1_j7l1m4c wrote

>If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness.

That would be great. It could lead to a future with things like copyleft data, where if you want to train on open stuff, your model legally *must* be open.

1

ninjasaid13 t1_j7irgdv wrote

I think he meant more about open source being threatened.

3

hgoel0974 t1_j7l1meo wrote

You seem like the type to argue that any ethics related restrictions on science are bad.

1

superluminary t1_j7ldt9p wrote

If the US doesn’t allow it then China is just going to pick this up and run with it. These things are technically possible to do now. The US can either be at the front, leading the AI revolution, or can dip out and let other countries pick it up. Either way it’s happening.

1

klop2031 t1_j7hyv7h wrote

Nah, thats not good. Who cares about getty... no one... let science move forward

0

MisterBadger t1_j7iex5o wrote

If "science" can't move to the next base without getting enthusiastic consent from the other parties it hopes to involve, then "science" should damn well keep its mitts to itself. In the case of OpenAI, "science" got handsy with people's personal stuff who were unaware of what was going on, and who would not have given consent if they had known. OpenAI's approach to science is creepy, unethical, and messed up.

2

klop2031 t1_j7qcak1 wrote

Take a gander here: https://youtu.be/G08hY8dSrUY At min 8 and 9 sec Seems like no one knows how scotus will deal with it but a good argument is that an AI is experiencing are like humans and generates new work by mixing in its skill.

Further, it seems like the law may only differentiate it by the intelligences' physical makeup.

And to be honest, it seems like the only ppl mad about generative networks producing art are the artists about to lose their jobs.

Who cares if an AI can create art, if one only cares about the creative aspect then the human can make art too, no one is stopping them. But really its about money.

0

MisterBadger t1_j7rj1bf wrote

Machine learning algorithms are not even "intelligent" enough to filter out Getty watermarks.

They do not have minds or experiences, any more than zbrush or cinema4D or any other complicated software do.

Furthermore, they do not produce outputs like humans do - the speed and scale are more akin to automated car factories than human tinkers.

Fair use laws were not designed with them in mind.

1

trias10 t1_j7i0pq3 wrote

I personally don't care a whit about Stable Diffusion. AI should be going after rote, boring tasks via automation, not creativity and art. That's the one thing actually enjoyable about life, and the last thing we should be automating with stupid models that are just scaled matrix multiplication.

−12

ninjasaid13 t1_j7irv62 wrote

>. That's the one thing actually enjoyable about life

Opinion.

4

MelonFace t1_j7khppc wrote

As if the rest of this whole thread isn't opinions, or opinions acting like facts about law that has not yet been explored.

1

FoveatedRendering t1_j7mlqeu wrote

If it's so enjoyable, all the more reason to automate it to get a lot more and increasingly better art.

Everyone enjoys art, 0.0001% can make it. AI will make the 99.9999% of people who enjoy art have more options and give superpowers to the previous 0.0001%(and any creator) to make more art.

1

VeritaSimulacra t1_j7ibe8b wrote

Hopefully. Stability winning would decrease the openness of the internet. I already know software projects that aren’t being open sourced to avoid being part of training data, I’m sure artists will be much less likely to openly share as well.

0

trias10 t1_j7ibyhq wrote

As it should be. If openness of internet means a few people become rich off the back of training on large swathes of data without explicit permission, then it should be stopped.

OpenAI should pay for their own labelled datasets, not harvest from the internet without explicit permission, to then sell back as GPT3 and get rich off of. This absolutely has to be punished and stopped.

−5

VeritaSimulacra t1_j7icqmp wrote

I agree with the goal, but I don’t think making the internet more closed is the way to go. The purpose of the internet is to be open. Making everything on the internet cost something would have a lot of negative effects on it. The solution to the powerful exploiting our openness isn’t to make it closed, but to regulate their usage of it.

3

trias10 t1_j7ifhdq wrote

I agree, hence I support this lawsuit and hope that Getty wins, which I hope leads to some laws vastly curtailing which data AI can be trained on, especially when that data comes from artists/creators, who are already some of the lowest paid members of society (unless they're the lucky 0.01% of that group).

−2

currentscurrents t1_j7ipcip wrote

OpenAI is doing a good thing. They've found a new and awesome way to use data from the open web, and they deserve their reward.

Getty's business model is outdated now, and the legal system shouldn't protect old industries from new inventions. Why search for a stock image that sorta kinda looks like what you want, when you could generate one that matches your exact specifications for free?

−1

trias10 t1_j7irgui wrote

What good thing is OpenAI doing exactly? I have yet to see any of their technologies being used for any sort of societal good. So far the only thing I have seen is cheating on homeworks and exams, faking legal documents, and serving as a dungeon master for D&D. The last one is kind of cool, but the first two are illegal.

Additionally, if you work in any kind of serious research division at a FAANG, you'd know there is a collective suspicion of OpenAI's work, as their recent papers (or lack thereof for ChatGPT) no longer describe the exact and specific data they used (beyond saying The Internet) and they no longer release their training code, making independent peer review and verification impossible, and causing many to question if their data is legally obtained. At any FAANG, you need to rope Legal into any discussion about data sources long before you begin training, and most data you see on the internet isn't actually usable unless there is an explicit licence allowing it, so a lot of data is off limits, but OpenAI seems to ignore that, hence they never discuss their data specifics anymore.

We live in a world of laws and multiple social contracts, you can't just do as you feel. Hopefully OpenAI is punished and restricted accordingly, and starts playing by the same rules as everyone else in the industry. Fanboys such as yourself aren't helpful to the progress of responsible, legal, and ethical AI research.

2

currentscurrents t1_j7ivm0a wrote

>the only thing I have seen is cheating on homeworks and exams, faking legal documents, and serving as a dungeon master for D&D. The last one is kind of cool, but the first two are illegal.

Well that's just cherry-picking. LLMs could do very socially-good things like act as an oracle for all internet knowledge or automate millions of jobs. (assuming they can get the accuracy issues worked out - which there are tons of researchers trying to do, some of whom are even on this sub)

By far the most promising use is allowing computers to understand and express complex ideas in plain english. We're already seeing uses of this, for example text-to-image generators use a language model to understand prompts and guide the generation process. Or how Github Copilit can turn instructions from english into implementations in code.

I expect we'll see them applied to many more applications in the years to come, especially once desktop computers get fast enough to run them locally.

>starts playing by the same rules as everyone else in the industry.

Everyone else in the industry is also training on copyrighted data, because there is no source of uncopyrighted data big enough to train these models.

Also, your brain is updating its weights based on the copyrighted data in my comment right now, and that doesn't violate my copyright. Why should AI be any different?

5