unua_nomo t1_j2enydh wrote on December 31, 2022 at 6:27 PM

Reply to comment by misconfigbackspace in There's now an open source alternative to ChatGPT, but good luck running it by ravik_reddit_007

Crowdsource the funding, not the content the model is trained on

misconfigbackspace t1_j2erpp0 wrote on December 31, 2022 at 6:53 PM

Funding one time's fairly easy. Getting a copy of that data is a little harder. That data will become stale in real time as the world moves forward, so that's the other big thing to keep in mind. I wonder what legal challenges will come up in the event the model copies stuff from litigious IP owners like Disney, the top music artists, Hollywood and the like.

unua_nomo t1_j2eyhnh wrote on December 31, 2022 at 7:40 PM

I mean there are already open source datasets available, such as the Pile.

I can't see any argument for why a model derived on open source data would likewise not be open source, at which point if you could argue that a ML model could produce ip breaking content, that would be the responsibility of the individual producing and subsequently distributing that content.

As for data becoming stale, that wouldn't necessarily be an issue for plenty of applications, and even then there's no reason you couldn't just crowd fund 80k a year to train a newly updated model with newer content folded in.

misconfigbackspace t1_j2ez1sa wrote on December 31, 2022 at 7:44 PM

> such as the Pile.

TIL. Thanks.

syfari t1_j2fekeo wrote on December 31, 2022 at 9:37 PM

Challenges are already popping up from artists over diffusion models. A lot of this has already been settled though as courts have determined model training to fall under fair use.