Ronny_Jotten
Ronny_Jotten t1_j6i3uog wrote
Reply to comment by CallFromMargin in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
I don't know what paper you're referring to, but there's this one:
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
It clearly shows, at the top of the first page, the full Stable Diffusion model, trained on billions of LAION images, replicating images that are clearly "substantially similar" copyright violations of its training data. The paper cites several other papers regarding the ability of large models to memorize their inputs.
It may be possible to tweak the generation algorithm to no longer output such similar images, but it's clear that they are still present in the trained model network.
Ronny_Jotten t1_j6hzcpu wrote
Reply to comment by CallFromMargin in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
> The computer is not storing a copy of original work in trained model. It looks at picture, it learns stuff from it and it stores only what it learns.
Just because you anthropomorphize the computer as "looking at" and "learning stuff", doesn't mean it's not digitally copying and storing enough of the original work in a highly compressed form within the neural network to violate copyright by producing something "substantially similar": Image-generating AI can copy and paste from training data, raising IP concerns | TechCrunch
But regardless of whether it produces a "substantially similar" work as output, making a copy of the original copyrighted work into the computer in the first place is a required step in training the AI network. Doing so is only legally allowed if it's fair use. That was the question in the Google books case - it was found that the scanning of books was fair use, because Google didn't use it to create new books or otherwise economically damage the authors or the market for the original books. But that's not necessarily the case with all instances of making digital copies of copyrighted works.
> Your argument is based either on fundamental misconception on your part, or a flat out lie from you. Neither one casts you in good light
Well, you can fuck off with that, dude. There's no call for that kind of personal attack.
Ronny_Jotten t1_j6hx2hb wrote
Reply to comment by ostrichpickle in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
> If A.I. can't use others copyrighted work to learn and train, why can people?
But it is allowed to use copyrighted works to train an AI - as long as it constitutes fair use. What's probably not fair use though, is to sell or flood the market with cheap works produced by a machine, if it negatively impacts the market for the original works it's trained on. Copyright laws make a distinction between humans and machines, because they're not the same thing. For example, works created solely by non-humans, whether a machine or a monkey, can't be copyrighted. According to the US copyright office, it requires "the nexus between the human mind and creative expression".
Ronny_Jotten t1_j6hujf1 wrote
Reply to comment by OfCourse4726 in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
> i feel like works produced by ai probably can not be copyrighted.
The US has already said it won't grant copyright to machine-produced works, because they lack the required creativity: The US Copyright Office says an AI can’t copyright its art - The Verge
Ronny_Jotten t1_j6htzkr wrote
Reply to comment by vgf89 in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
> I'm fairly certain that every one of these 4 points can be argued in favor of generative AI's.
Ok, but you haven't actually done that. You only argue that it makes things more convenient and cheap for the users, who no longer have to hire the actual programmers or artists whose work it samples and undercuts. That's exactly the thing that could cause it to fail the fourth rule for fair use.
Ronny_Jotten t1_j6hspu6 wrote
Reply to comment by CallFromMargin in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
It's literally not the same thing though, at least legally speaking. It's already accepted that a human looking at an artwork is not "making a copy", as defined in the copyright laws. As long as they don't produce a "substantially similar" work, there's no copyright violation. The same can't be said for scanning or digitally copying a work into a computer; that is "making a copy" that's covered by the copyright laws. In some cases, that can come under the "fair use" exemption. But not in all cases. It's evaluated on a case-by-case basis; in the US according to the four-part fair use test. For example, if it's found that the generated works have a negative economic impact on the value of the original works, there's a substantial chance that it won't be found to be fair use.
Ronny_Jotten t1_j6hrjni wrote
Reply to comment by CallFromMargin in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
> In that specific case, no. Fair use laws cover that, and Google vs author guild had solved that specific case in court. Using your work falls under fair use, just like human reading your work and incorporating ideas in his/her own work.
That's completely false. The Google case was found to be fair use, precisely because it did not "dilute the market for writing". That's one of the four legal tests for fair use. The judge said that it did not produce anything that competed economicially in the market for the books that it scanned; on the contrary it might increase their sales. Whether such scanning is fair use, is determined on a case-by-case basis. If AIs are being used to produce "new" works that are sold commercially and undercut the authors of the originals that it's based on, it will be much more difficult to prove fair use.
Furthermore, the Copilot product creates a loophole where businesses can incorporate code released under e.g. a GPL license that requires said business to release its deriviative works under the same open-source license, and make it closed-source instead. That can also create an unfair economic advantage in the market. These questions are far from "solved".
Ronny_Jotten t1_j6hpnnj wrote
Reply to comment by IAmDrNoLife in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
> They don't store any original art used in the training [...] these models do not replicate the art it has been trained on. Every single piece of art generated by AI, is something entirely new. Something that has never been seen before. You can debate if it takes skill, but you can't debate that it's something new
They can very easily reproduce images and text that are substantially similar to the training input, to the extent that it is clearly a copyright violation.
Image-generating AI can copy and paste from training data, raising IP concerns | TechCrunch
> courts have indeed shown previously that Google IS allowed to data mine a bunch of data [...] there's a difference [...] But the focus here was on the data mining.
In the case of the Google Books search product, the scanning of copyrighted works ("data mining") was found to be fair use. That absoutely does not mean that all data mining is fair use. Importantly, it was found that it had no economic impact on the market for the actual books, it did not replace the books. In order for the code/text/image AI generators' "data mining" of copyrighted works to be fair use, it will also have to meet that test. Otherwise, the mining is a copyright violation.
Ronny_Jotten t1_j5tot6y wrote
Ronny_Jotten t1_j5toez0 wrote
Ronny_Jotten t1_j4dafyz wrote
Reply to comment by marr75 in [D] Is MusicGPT a viable possibility? by markhachman
Sorry, but that's entirely false. See my other comment. The US fair use test was created in 1841. The Google case only found that its book search product passed the test, including the publication of "snippets" not having a negative impact on the market for books. That doesn't mean every other arguably-similar project passes the test too. They would need to show that, for example, generated images do not impact the market for images made by the artists whose work was scanned - which is obviously not the case. The situation with generative neural networks is not at all "well settled" by the case about Google's book search.
Ronny_Jotten t1_j4d824g wrote
Reply to comment by marr75 in [D] Is MusicGPT a viable possibility? by markhachman
You're right, sorry, I had several tabs open on a similar subject... the post I was referring to is this:
> The multi-part fair use test established in AGI vs Google is widely held to be applicable to AI and ML models.
The US four-part fair use test was established long before AGI v. Google: in the 19th century in Folsom v. Marsh. It was encoded into copyright law in 1976. It's applicable to everything.
The case only decided that Google's specific book service did in fact pass the test. The most important aspect is that the judge found that there was no economic damage to the book authors, that it did not replace the books or negatively impact the market for books.
The decision is not applicable to other projects that may be substantially different in character. I'm sure OpenAI's lawyers are hoping that DALL-E will be considered to be equivalent to Google's book search - that they have fair use rights to digitize copyrighted material without permission, and publish something transformed that only contains "snippets" of it. But they will have to get around the fourth factor. Who will commission an expensive original artwork from Greg Rutkowski, when they can simply type a prompt including "in the style of..." and get something substantially similar, for less than a nickel? Will companies use GPL3 code in their products, when they can get a mashed-up facsimile with the restrictive license removed? The question of fair use in the context of generative neural networks is far from settled; hence the lawsuits in the (other) post.
Ronny_Jotten t1_j4b65xn wrote
Reply to comment by Mefaso in [D] Is MusicGPT a viable possibility? by markhachman
It won't directly stop research, because that's fair use. It may well stop commercial exploitation of the research, at least to some extent. If so, companies would be less willing to invest in research, so it would have a chilling effect on the research anyway. But copyright issues can be worked out, if there's money to be made. It's just a question of how it's collected and to whom it's distributed...
Ronny_Jotten t1_j4b5fqx wrote
Reply to comment by Kafke in [D] Is MusicGPT a viable possibility? by markhachman
Images and text are already quite different from each other though, in terms of AI generators. The image generators include a language model, but work on a diffusion principle that the text generators don't use. Riffusion's approach of using a diffusion image generator with sonograms is interesting to some extent, but I sincerely doubt it will be the future direction of high-quality music generators.
Ronny_Jotten t1_j4b3rjl wrote
Reply to comment by mycall in [D] Is MusicGPT a viable possibility? by markhachman
There is no such 30% rule. Tests for copyright infringement are much more complex. And even if there were, a copyrighted work changed by a certain amount, so that it can by copyrighted itself, will still be a derivative work, subject to the original copyright.
Weird Al can do what he does, because it's satire, and there are exceptions for that as fair use and freedom of speech in copyright law. Try changing a Beatles song by 30%, in a non-satirical way, and see how far you get with publishing it...
Ronny_Jotten t1_j4b2p0z wrote
Reply to comment by marr75 in [D] Is MusicGPT a viable possibility? by markhachman
That's not remotely true. There was a Google case, but that was about creating a books search database, not actually selling AI-produced books. The lawsuits against Microsoft etc. are proceeding, and in the meantime many other major companies are staying (or backing) away from selling AI-produced content until it's clear what the legal situation is. It's certainly not settled.
Ronny_Jotten t1_j4b1q88 wrote
Reply to comment by itsnotlupus in [D] Is MusicGPT a viable possibility? by markhachman
Non-commercial use doesn't give you a pass on copyright infringement. It's just that the punishment is less severe. You can't freely share your music and movie libraries on Bittorrent. You can still get cease and desist orders, DMCA takedown notices, fines, loss of Internet, etc. (depending where you live).
Ronny_Jotten t1_j4b1bln wrote
Reply to comment by marr75 in [D] Is MusicGPT a viable possibility? by markhachman
Copyright is an enormous issue for AI models - did you not read the post? [Oops, I meant this post.] Have you not heard everyone talking about it lately? The Google case is irrelevant to this question. It was decided that Google building a search database of books was fair use, and didn't have an adverse economic impact on the books' authors - on the contrary, it boosted sales.
Had Google built an AI trained on the books' content, and then generated books for sale, it would have been a different outcome.
Ronny_Jotten t1_j3t62bb wrote
Reply to comment by midasza in Nvidia GeForce RTX 4000 GPUs Coming to Gaming Laptops Next Month by Jack-Robert22
That's the way it's always been, laptop versions have much less capability, why would they change now?
Ronny_Jotten t1_j3t5tvk wrote
Reply to comment by IslandChillin in Nvidia GeForce RTX 4000 GPUs Coming to Gaming Laptops Next Month by Jack-Robert22
Why do you have to fathom it, it says right there it starts at $1999.
Ronny_Jotten t1_j2zqtt8 wrote
Reply to [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
What does "truly outperform humans" mean? It sounds so broad, some kind of philosophical question, like "How many angels can dance on the head of a pin?". What are you asking? Can a machine truly outperform a human at climbing, hammering, flying, calculating, sorting, or drawing accurate conclusions in a limited domain given a certain input? Of course, obviously. Can it truly outperform a human at falling in love, tasting an apple, or getting drunk on wine? No.
Humans have always been augmented by their tools. That is one of the fundamental characteristics of being human. At the tasks they're designed for, artificial tools vastly increase the performance of humans, and allows them to outperform what they could do without it. Humans have enhanced their cognitive abilities with all kinds of calculating and "thinking" machines, for millennia. A human with a clay tablet and a reed can far outperform other humans at remembering things. But what is a clay tablet - or a PC, or anything - without humans? Nothing, as far as humans are concerned.
Ronny_Jotten t1_j2fcaq1 wrote
Reply to comment by lucidraisin in An Open-Source Version of ChatGPT is Coming [News] by lambolifeofficial
> my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.
Sure, but I didn't say anything about your other repositories. I said that this particular repository is a proof of concept, in the sense that it demonstrates working code that could serve in the development of a future open-source ChatGPT-like system, but such a system, as you say, is not imminent. It's great that you're working towards it though!
Ronny_Jotten t1_j2f9tcx wrote
Reply to comment by yeeeshwtf in "You can use multiple words to describe something" Germany: by Nox_Dei
It's meant for cracking egg shells. There is no restriction that it must only ever be used for soft-boiled eggs, never for hard-boiled (the manufacturer of the "Clack" says it's for "boiled eggs", and has a photo that appears to be hard-boiled) or raw eggs (as shown in the photo you linked). The idea that this is "wrong" use by people is just your wrong opinion.
Ronny_Jotten t1_j2f34ef wrote
Reply to comment by FastWalkingShortGuy in "You can use multiple words to describe something" Germany: by Nox_Dei
English speakers might call what is generically known in German as an Eierköpfer, an "egg opener", or an "egg topper", "egg cracker", "egg cutter", etc. But the name of this particular product, Eierschalensollbruchstellenverursacher, translates to something like "eggshell break-point creator".
Ronny_Jotten t1_j6i68gn wrote
Reply to comment by CallFromMargin in Microsoft, GitHub, and OpenAI ask court to throw out AI copyright lawsuit by Tooskee
And yet, the citation I gave shows Stable Diffusion obviously replicating copyrighted images from the LAION training set, despite your musings about thermodynamics. It may not store reproducible representations of all images, I don't know - but it unquestionably does store some.
In any case, it doesn't change the fact that copying images into the computer in the first place, in order to train the model, would need to come under a fair use exemption. For example, research generally does - but not in every case, especially if it causes economic damage to the original authors. In many countries, authors also have moral rights, to attribution, to preservation of the integrity of their work against alteration that damages their reputation, etc., which may come into play.