hybridteory t1_iv0f5y5 wrote on November 4, 2022 at 11:00 AM

Reply to comment by ComplexColor in [D] DALL·E to be made available as API, OpenAI to give users full ownership rights to generated images by TiredOldCrow

Yes, I find it incredibly strange that when speaking about Codex, everyone is worried about the models regurgitating the code they have been trained on while citing GPL and other licenses; but this seems to not be that much of an issue when it comes to images (given anecdotal evidence from these discussions), even though they themselves have licenses. It just goes to show that humans perceive text and images very differently from a creative point of view.

farmingvillein t1_iv2bbw6 wrote on November 4, 2022 at 7:13 PM

If there can be a lawsuit, there eventually certainly will be one.
The issues here are--for now--different. The current claim is that Codex is copy-pasting things that need licenses attached. (Whether this is true will of course be played out in court.) For image generation, no one has made the claim--yet--that these systems are emitting straight copies (at any meaningful scale) of someone else's original pictures.

hybridteory t1_iv2ebe5 wrote on November 4, 2022 at 7:33 PM

Codex is not technically copy pasting; it is generating a new output that is (almost) exactly the same, or indistinguishable on the eyes of a human, to the input. Sounds like semantics, but there is no actual copying. You already have music generating algorithms that can also generate short samples that are indistinguishable to the inputs (memorisation). Dall-E 2 is not there yet, but we are close to prompting "Original Mona Lisa painting" and be given back the original Mona Lisa painting with striking similarities. There are already several generative models of images that can mostly memorise inputs used to train it (quick example found using google: https://github.com/alan-turing-institute/memorization).

farmingvillein t1_iv2vqmx wrote on November 4, 2022 at 9:32 PM

> Codex is not technically copy pasting; it is generating a new output that is (almost) exactly the same, or indistinguishable on the eyes of a human, to the input.

Nah, it is literally generating duplicates. This is copying, in the eyes of the law. Whether this is an actual legal problem remains to be seen.

> Dall-E 2 is not there yet, but we are close to prompting "Original Mona Lisa painting" and be given back the original Mona Lisa painting with striking similarities.

This is confused. Dall-E 2 is "not there yet", as a general statement, because they specifically have trained it not to do this.

hybridteory t1_iv30cij wrote on November 4, 2022 at 10:06 PM

There is nothing about diffusion models that stop it from memorising data. Dall-E 2 can definitely memorise.

farmingvillein t1_iv38uzt wrote on November 4, 2022 at 11:11 PM

That is my point? I'm not sure how to square your (correct) statement with your prior statement:

> Dall-E 2 is not there yet, but we are close to prompting "Original Mona Lisa painting" and be given back the original Mona Lisa painting with striking similarities