phire t1_jects6y wrote on March 31, 2023 at 1:59 AM

It gets a bit more complicated.

OpenAI can't actually claim copyright on the output of ChatGPT, so licensing something trained on ChatGPT output as MIT should be fine from a copyright perspective. But OpenAI do have terms and conditions that forbid using ChatGPT output to train an AI... I'm not sure how enforceable that is, especially when people put ChatGPT output all over the internet, making it near impossible to avoid in a training set.

As for retraining the LLaMA weights... presumably Facebook do hold copyright on the weights, which is extremely problematic for retraining them and relicensing them.

pasr9 t1_jecwvck wrote on March 31, 2023 at 2:23 AM

Facebook do not hold copyright to the weights for the same reasons they do not hold copyrights to the output of their models. Neither the weights or output meet the threshold of copyrightablitity. Both are new works created out of a purely mechanical process that lack direct human authorship and creativity (two of the prerequisites required for copyright to apply).

For more information: https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence

phire t1_jed57od wrote on March 31, 2023 at 3:34 AM

Hang on, that guidence only covers generated outputs, not weights.

I just assumed weights would be like compiled code, which is also a fully mechanical process, but copyrightable because of the inputs.... Then again, most of the training data (by volume) going into machine learning models isn't owned by the company.

EuphoricPenguin22 t1_jedhyci wrote on March 31, 2023 at 5:45 AM

Using training data without explicit permission is (probably) considered to be fair use in the United States. There are some currently active court cases relating to this exact issue here in the U.S., namely Getty Images (US), Inc. v. Stability AI, Inc. The case is still far from a decision, but it will likely be directly responsible for setting a precedent on this matter. There are a few other cases happening in other parts of the world, and depending on where you are specifically, different laws or regulations may already be in place that clarify this specific area of law. I believe there is another case against Stability AI in the UK, and I've heard that the EU was considering adding or has added an opt-out portion of the law; I'm not sure.

phire t1_jedo041 wrote on March 31, 2023 at 7:03 AM

Perfect 10, Inc. v. Amazon.com, Inc. established that it was fair use for google images to keep thumbnail sized copies of images because providing image search was transformative.

I'm not a lawyer, but thumbnails are way closer to the original than network weights, and AI image generation is arguably way more transformative than providing image search. I'd be surprised if Stability loses that suit.

pm_me_your_pay_slips t1_jee2xtt wrote on March 31, 2023 at 10:36 AM

Perhaps applicable to the generated outputs of the model, but it’s not a clear case for the inputs used as training data. It could very well end up in the same situation as sampling in the music industry. Which is transformative, yet people using samples have to “clear” them by asking for permission (usually involves money).

Sopel97 t1_jefa735 wrote on March 31, 2023 at 4:16 PM

"terms and conditions" means that at worst openai will restrict your access to chatgpt, no?

artsybashev t1_jefp0o2 wrote on March 31, 2023 at 5:53 PM

Yes the only thing they can do is ban you from their service

pasr9 t1_jecwbqm wrote on March 31, 2023 at 2:19 AM

AI output is not currently copyrightable in the US.

ZetaReticullan t1_jedy3uf wrote on March 31, 2023 at 9:30 AM

Citation(s) please.

[deleted] t1_jeevamd wrote on March 31, 2023 at 2:39 PM

[removed]

Ronny_Jotten t1_jegl72r wrote on March 31, 2023 at 9:28 PM

That's an over-generalization, based on the ruling in the specific "Zarya of the Dawn" case. It doesn't necessarily apply to every case. It's determined on a case-by-case basis. See:

U.S. Copyright Office says some AI-assisted works may be copyrighted | Reuters

Jean-Porte t1_jedkyhk wrote on March 31, 2023 at 6:23 AM

Are the users responsible for using a model that was badly licensed?

sweatierorc t1_jegm93d wrote on March 31, 2023 at 9:35 PM

*insert if they could read meme*

[P] Introducing Vicuna: An open-source language model based on LLaMA 13B

cathie_burry t1_jechk0t wrote on March 31, 2023 at 12:25 AM

ktpr t1_jeco4so wrote on March 31, 2023 at 1:15 AM