cjschnyder

cjschnyder t1_ixd1qtx wrote

So that's the first time a specific dataset, or rather a dataset manufacturer, was mentioned. While that one may be static, others may not be. It should also be noted that LAION is making other datasets so while ONE of their sets might be static and completed they are continuously making updated sets.

I would also hesitate to say it's a "matter of opinion" in the stealing department. It is stealing. Something extremely common place on the internet today as to not really be noticed but just cause something is in a public space does not mean it is public domain.

What is the difference between what you posed, someone taking a specific artists work to make a dataset to copy them and a dataset that includes enough of someones art to copy them and then is used in such a manner?

1

cjschnyder t1_ixcqy7t wrote

It sounds like the big assumption here is that scraping = taking literally anything and everything. You're correct in the sense that it's semi-curated but my guess is that any AI generator isn't doing that with teams of people each looking across the web and loading in images into a dataset. It's doing it with bots. And those bots are scraping the web.

What they pull definitely goes through a filtering process but just cause it's a " semi-curated dataset" doesn't mean that data wasn't scraped. In my day job I work in building analytics data pipelines for web traffic, I'm familiar with how vast amounts of data is aggregated and put together.

1

cjschnyder t1_ixcp6hy wrote

"That's absolutely not how it works." That's absolutely how it works. That's why when prompting AI art generations you can ask for it in a specific artists style because it was fed into the machine to learn. And I'd bet that the WIDE majority of artists were not asked permission.

As for the effort that went into this, I can't tell. It's so obviously AI art that whatever other manipulation might have been done by the poster is unknown.

3