Viewing a single comment thread. View all comments

cegras t1_je0jsud wrote

Do you know if ChatGPT was allowed to ingest PDFs found on the internet? Even if not, I'm sure there are many sections of famous textbooks reproduced in HTML or parsable form.

2

ianitic t1_je0mjqx wrote

Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

It is obvious when a book is a part of its training set or not though based on the above test.

10

currentscurrents t1_je12d3k wrote

Nobody knows exactly what it was trained on, but there exist several datasets of published books.

>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.

6

mcilrain t1_je1a7cl wrote

Current tech could be used to allow you to ask an AI assistant to read you a book.

3

MrFlamingQueen t1_je0w3ut wrote

Not sure on the training corpus, but like you mentioned, there's ton of other forms of textbooks and solution manuals to textbook problems on things like github, stackexchange, etc.

3

mcilrain t1_je19vif wrote

Even if it didn't ingest PDFs it probably ingested websites that scraped PDFs to spam search engine results.

1