Viewing a single comment thread. View all comments

JustOneAvailableName t1_j7le6dw wrote

> but commercial use requires opt-in consent from content creators

You might as well ban it directly for commercial use with opt in

7

TaXxER t1_j7ojt22 wrote

As much as I like ML, it’s hard to argue that training ML models on data without consent, let alone even copyrighted data, would somehow be OK.

3

JustOneAvailableName t1_j7oknmi wrote

Copyright is about redistribution and we're talking pubicly available data. I don't want/need to give consent to specific people/companies to allow them to read this comment. Nor do I think it should now be up to reddit to decide what is and isn't allowed

3

TaXxER t1_j7omop6 wrote

Generative models do redistribute though, often outputting near copies:

https://openaccess.thecvf.com/content/WACV2021/papers/Tinsley_This_Face_Does_Not_Exist..._But_It_Might_Be_Yours_WACV_2021_paper.pdf

https://arxiv.org/pdf/2203.07618.pdf

Copyright does not only cover republishing, but also covers derived work. I think it is a very reasonable position to consider all generative model output o for which some training set image Xi had a particularly large influence on o, to be derived work from Xi.

Similar story holds true for code generation models and software licensing: copilot was trained on lots of software repos that had software licenses that require all derived work to be licensed under an at least equally permissive license. Copilot may very well output a specific code snippets particularly based on what it has seen in a particular repo, thereby potentially opening up the user to the obligation to the licensing constraints that come with deriving work from that repo.

I’m an applied industry ML researcher myself, and am very enthousiastic about the technology and state of ML. But I also think that as a field as a whole we have unfortunately been careless about ethical and legal aspects.

2