MrFlamingQueen t1_je0j29h wrote on March 28, 2023 at 3:20 PM

It feels like majority of the people in this discussion have no idea what computer science is and what LeetCode tests.

As you mentioned, there are hundreds of websites devoted to teaching the leetcode design patterns and entire books devoted to learning and practicing these problems.

TheEdes t1_je149kf wrote on March 28, 2023 at 5:34 PM

Yeah but if you were to come up with a problem in your head that didn't exist word for word then GPT-4 would be doing what they're advertising, however, if the problem was word for word anywhere in the training data then the testing data is contaminated. If the model can learn the design patterns for leetcode style questions by looking at examples of them, then it's doing something really good, if it can only solve problems that it has seen before, then it's nothing special, they just overfit a trillion parameters on a comparatively very small dataset.

cegras t1_je2k9dr wrote on March 28, 2023 at 11:09 PM

ChatGPT is great at learning the nuances of english, i.e. synonyms and metaphors. But if you feed it a reworded leetcode question and it finds the answer within its neural net, has it learned to conceptualize? No, it just learned that synonym ...

TheEdes t1_je6tweq wrote on March 29, 2023 at 8:46 PM

Sure but what's being advertised isn't sentience per se, at least with the leetcode part of their benchmarks. The issue here is that they claim that it can do X% on leetcode, but it seems like it's much less on new data. Even if it learned to find previous solutions and replace it with changes it should be able to perform well due to the nature of the problems.

MrFlamingQueen t1_je3kywp wrote on March 29, 2023 at 3:54 AM

Agreed. It's very likely contamination. Even "new" LeetCode problems existed before they were published on the website.

cegras t1_je0jsud wrote on March 28, 2023 at 3:25 PM

Do you know if ChatGPT was allowed to ingest PDFs found on the internet? Even if not, I'm sure there are many sections of famous textbooks reproduced in HTML or parsable form.

ianitic t1_je0mjqx wrote on March 28, 2023 at 3:42 PM

Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

It is obvious when a book is a part of its training set or not though based on the above test.

currentscurrents t1_je12d3k wrote on March 28, 2023 at 5:22 PM

Nobody knows exactly what it was trained on, but there exist several datasets of published books.

>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.

mcilrain t1_je1a7cl wrote on March 28, 2023 at 6:11 PM

Current tech could be used to allow you to ask an AI assistant to read you a book.

DreamWithinAMatrix t1_je3c6kl wrote on March 29, 2023 at 2:39 AM

There was that time Google was taken to court for scanning and indexing books for Google Books or whatever and Google won:

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.

MrFlamingQueen t1_je0w3ut wrote on March 28, 2023 at 4:43 PM

Not sure on the training corpus, but like you mentioned, there's ton of other forms of textbooks and solution manuals to textbook problems on things like github, stackexchange, etc.

mcilrain t1_je19vif wrote on March 28, 2023 at 6:09 PM

Even if it didn't ingest PDFs it probably ingested websites that scraped PDFs to spam search engine results.

SzilvasiPeter t1_je4pknf wrote on March 29, 2023 at 12:15 PM

Should I bet a coffee? No way... that is too much of a deal.

[N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data

cegras t1_je0gfd7 wrote on March 28, 2023 at 3:03 PM