Submitted by Balance- t3_124eyso in MachineLearning
rfxap t1_jdzfxd1 wrote
There are other benchmarks to look at though. Microsoft Research tried an early version of GPT-4 on LeetCode problems that were published after the training data cutoff date, and they got results similar to human performance in all difficulty categories: https://arxiv.org/abs/2303.12712 (page 21)
What should we make of that?
abc220022 t1_jdzrsbu wrote
Part of the sales pitch behind LeetCode is that you are working on problems that are used in real coding interviews at tech companies. I believe that most LeetCode problems were invented well before they were published on the LeetCode website, so they still could appear in some form in their training data.
neonwatty t1_je06w3v wrote
absolutely
keepthepace t1_jdzp4ge wrote
Could some parts of the dataset be copied into the LeetCode problem or is there a guarantee that these problems are 100% novel?
londons_explorer t1_jdzwcfo wrote
Problems like this are never 100% novel.
There are always elements and concepts of the problem and solution that have been copied from other problems.
The easiest way to see this is to ask a non-programmer to come up with a 'programming puzzle'. They'll probably come up with something like "Make an app to let me know when any of my instagram friends are passing nearby and are up for hanging out".
Compare that to a typical leetcode problem, and you'll soon see how leetcode problems are really only a tiny tiny corner of what is possible to do with computers.
currentscurrents t1_je13kdr wrote
True! But also, problems in general are never 100% novel. That's why metalearning works.
You can make up for poor reasoning abilities with lots of experience. This isn't bad exactly, but it makes testing their reasoning abilities tricky.
cegras t1_je0gfd7 wrote
If you google most leetcode problems I would bet a coffee that they've existed on the internet long before leetcode came into existence.
MrFlamingQueen t1_je0j29h wrote
It feels like majority of the people in this discussion have no idea what computer science is and what LeetCode tests.
As you mentioned, there are hundreds of websites devoted to teaching the leetcode design patterns and entire books devoted to learning and practicing these problems.
TheEdes t1_je149kf wrote
Yeah but if you were to come up with a problem in your head that didn't exist word for word then GPT-4 would be doing what they're advertising, however, if the problem was word for word anywhere in the training data then the testing data is contaminated. If the model can learn the design patterns for leetcode style questions by looking at examples of them, then it's doing something really good, if it can only solve problems that it has seen before, then it's nothing special, they just overfit a trillion parameters on a comparatively very small dataset.
cegras t1_je2k9dr wrote
ChatGPT is great at learning the nuances of english, i.e. synonyms and metaphors. But if you feed it a reworded leetcode question and it finds the answer within its neural net, has it learned to conceptualize? No, it just learned that synonym ...
TheEdes t1_je6tweq wrote
Sure but what's being advertised isn't sentience per se, at least with the leetcode part of their benchmarks. The issue here is that they claim that it can do X% on leetcode, but it seems like it's much less on new data. Even if it learned to find previous solutions and replace it with changes it should be able to perform well due to the nature of the problems.
MrFlamingQueen t1_je3kywp wrote
Agreed. It's very likely contamination. Even "new" LeetCode problems existed before they were published on the website.
cegras t1_je0jsud wrote
Do you know if ChatGPT was allowed to ingest PDFs found on the internet? Even if not, I'm sure there are many sections of famous textbooks reproduced in HTML or parsable form.
ianitic t1_je0mjqx wrote
Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.
It is obvious when a book is a part of its training set or not though based on the above test.
currentscurrents t1_je12d3k wrote
Nobody knows exactly what it was trained on, but there exist several datasets of published books.
>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.
They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.
mcilrain t1_je1a7cl wrote
Current tech could be used to allow you to ask an AI assistant to read you a book.
DreamWithinAMatrix t1_je3c6kl wrote
There was that time Google was taken to court for scanning and indexing books for Google Books or whatever and Google won:
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.
MrFlamingQueen t1_je0w3ut wrote
Not sure on the training corpus, but like you mentioned, there's ton of other forms of textbooks and solution manuals to textbook problems on things like github, stackexchange, etc.
mcilrain t1_je19vif wrote
Even if it didn't ingest PDFs it probably ingested websites that scraped PDFs to spam search engine results.
SzilvasiPeter t1_je4pknf wrote
Should I bet a coffee? No way... that is too much of a deal.
VodkaHaze t1_je06t03 wrote
> LeetCode problems that were published after the training data cutoff date
A variation of those problems is likely on github before they're posted?
milktoasttraitor t1_jdzuw0z wrote
If you look at the prompt they show, they clearly gave it hints which tell it the exact approach to use in order to solve the problem. The problem is also a very slight derivative of another existing, very popular problem on the platform (“Unique Paths”).
This is impressive in another way, but not in the way they were trying to show. They didn’t show the other questions it got right, so no way of telling how good or bad the methodology was overall or what hints they gave it. For that question at least, it’s not good and it makes me skeptical of the results.
keepthepace t1_jdzm2ic wrote
That articles with peer-review is not something that should be avoided, even by Microsoft AI, sorry, "Open"AI
hardmaru t1_jdznq2v wrote
Not sure if this article has been peer reviewed
But saw some “peer reviews” on Twitter :)
See: https://twitter.com/sleepinyourhat/status/1638988283018465300
RubenC35 t1_je0ra56 wrote
Would they be a little bias? I mean Microsoft has spent loads of money in the idea of being the best.
salgat t1_je3eqx5 wrote
GPT4 is the world's best googler. As long as a similar solution existed on the internet in the past, there's a good chance GPT4 can pick it up, even if it's not on leetcode yet.
Nhabls t1_je93uvg wrote
The way they defined human performance there is just funny.
Dividing the number of accepted answers by total users.. might as well just make up a number
Viewing a single comment thread. View all comments