Viewing a single comment thread. View all comments

unua_nomo t1_j2eyhnh wrote

I mean there are already open source datasets available, such as the Pile.

I can't see any argument for why a model derived on open source data would likewise not be open source, at which point if you could argue that a ML model could produce ip breaking content, that would be the responsibility of the individual producing and subsequently distributing that content.

As for data becoming stale, that wouldn't necessarily be an issue for plenty of applications, and even then there's no reason you couldn't just crowd fund 80k a year to train a newly updated model with newer content folded in.

4