
Due_Cauliflower_9669 t1_j6gtawv wrote

Where does “better training data” come from? These bots are using data from the open web. The open web is full of good stuff but also a lot of bullshit. The approach ensures it continues to train itself on a mix of high-quality and low-quality data.


Due_Cauliflower_9669 t1_j6gs5xa wrote

These tools seem to be using others’ content for more than just training. There is growing anecdotal evidence that the content the AI creates contains recognizable segments/samples of other creators’ work. As in, chatbots recite entire sentences and paragraphs of other work in their answers, and image generators replicate parts of known images from other artists in their output. Not sure that qualifies as fair use, especially if OpenAI seeks to profit off the content its technology generates.