Viewing a single comment thread. View all comments

currentscurrents t1_j7innd5 wrote

Getty is just the test case for the question of copyright and AI.

If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness. It also seems distinctly unfair, since copyright is only supposed to protect the specific arrangement of words or pixels, not the information they contain or the artistic style they're in.

The big tech companies can afford to license content from Getty, but us little guys can't. If they win it will effectively kill open-source AI.

16

trias10 t1_j7iv9iy wrote

Data is incredibly valuable, OpenAI and Facebook have proven that. Ever bigger models require ever more data. And we live in a capitalist world, so if something is valuable, like data, you typically have to pay for it. So open source AI shouldn't be a thing.

Also, OpenAI is hardly open source anymore. They no longer disclose their data sources, data harvesting, data methodologies, nor release their training code. They also don't release their trained models anymore.

If they were truly open source, I could see maybe defending them, but at the moment all I see is a company violating data privacy and licences to get incredibly rich.

1

HateRedditCantQuitit t1_j7l1m4c wrote

>If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness.

That would be great. It could lead to a future with things like copyleft data, where if you want to train on open stuff, your model legally *must* be open.

1