Viewing a single comment thread. View all comments

rfxap t1_jdzfxd1 wrote

There are other benchmarks to look at though. Microsoft Research tried an early version of GPT-4 on LeetCode problems that were published after the training data cutoff date, and they got results similar to human performance in all difficulty categories: https://arxiv.org/abs/2303.12712 (page 21)

What should we make of that?

277

abc220022 t1_jdzrsbu wrote

Part of the sales pitch behind LeetCode is that you are working on problems that are used in real coding interviews at tech companies. I believe that most LeetCode problems were invented well before they were published on the LeetCode website, so they still could appear in some form in their training data.

350

keepthepace t1_jdzp4ge wrote

Could some parts of the dataset be copied into the LeetCode problem or is there a guarantee that these problems are 100% novel?

53

londons_explorer t1_jdzwcfo wrote

Problems like this are never 100% novel.

There are always elements and concepts of the problem and solution that have been copied from other problems.

The easiest way to see this is to ask a non-programmer to come up with a 'programming puzzle'. They'll probably come up with something like "Make an app to let me know when any of my instagram friends are passing nearby and are up for hanging out".

Compare that to a typical leetcode problem, and you'll soon see how leetcode problems are really only a tiny tiny corner of what is possible to do with computers.

93

currentscurrents t1_je13kdr wrote

True! But also, problems in general are never 100% novel. That's why metalearning works.

You can make up for poor reasoning abilities with lots of experience. This isn't bad exactly, but it makes testing their reasoning abilities tricky.

15

cegras t1_je0gfd7 wrote

If you google most leetcode problems I would bet a coffee that they've existed on the internet long before leetcode came into existence.

27

MrFlamingQueen t1_je0j29h wrote

It feels like majority of the people in this discussion have no idea what computer science is and what LeetCode tests.

As you mentioned, there are hundreds of websites devoted to teaching the leetcode design patterns and entire books devoted to learning and practicing these problems.

33

TheEdes t1_je149kf wrote

Yeah but if you were to come up with a problem in your head that didn't exist word for word then GPT-4 would be doing what they're advertising, however, if the problem was word for word anywhere in the training data then the testing data is contaminated. If the model can learn the design patterns for leetcode style questions by looking at examples of them, then it's doing something really good, if it can only solve problems that it has seen before, then it's nothing special, they just overfit a trillion parameters on a comparatively very small dataset.

8

cegras t1_je2k9dr wrote

ChatGPT is great at learning the nuances of english, i.e. synonyms and metaphors. But if you feed it a reworded leetcode question and it finds the answer within its neural net, has it learned to conceptualize? No, it just learned that synonym ...

8

TheEdes t1_je6tweq wrote

Sure but what's being advertised isn't sentience per se, at least with the leetcode part of their benchmarks. The issue here is that they claim that it can do X% on leetcode, but it seems like it's much less on new data. Even if it learned to find previous solutions and replace it with changes it should be able to perform well due to the nature of the problems.

1

MrFlamingQueen t1_je3kywp wrote

Agreed. It's very likely contamination. Even "new" LeetCode problems existed before they were published on the website.

2

cegras t1_je0jsud wrote

Do you know if ChatGPT was allowed to ingest PDFs found on the internet? Even if not, I'm sure there are many sections of famous textbooks reproduced in HTML or parsable form.

2

ianitic t1_je0mjqx wrote

Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

It is obvious when a book is a part of its training set or not though based on the above test.

10

currentscurrents t1_je12d3k wrote

Nobody knows exactly what it was trained on, but there exist several datasets of published books.

>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.

6

mcilrain t1_je1a7cl wrote

Current tech could be used to allow you to ask an AI assistant to read you a book.

3

MrFlamingQueen t1_je0w3ut wrote

Not sure on the training corpus, but like you mentioned, there's ton of other forms of textbooks and solution manuals to textbook problems on things like github, stackexchange, etc.

3

mcilrain t1_je19vif wrote

Even if it didn't ingest PDFs it probably ingested websites that scraped PDFs to spam search engine results.

1

SzilvasiPeter t1_je4pknf wrote

Should I bet a coffee? No way... that is too much of a deal.

1

VodkaHaze t1_je06t03 wrote

> LeetCode problems that were published after the training data cutoff date

A variation of those problems is likely on github before they're posted?

25

milktoasttraitor t1_jdzuw0z wrote

If you look at the prompt they show, they clearly gave it hints which tell it the exact approach to use in order to solve the problem. The problem is also a very slight derivative of another existing, very popular problem on the platform (“Unique Paths”).

This is impressive in another way, but not in the way they were trying to show. They didn’t show the other questions it got right, so no way of telling how good or bad the methodology was overall or what hints they gave it. For that question at least, it’s not good and it makes me skeptical of the results.

20

RubenC35 t1_je0ra56 wrote

Would they be a little bias? I mean Microsoft has spent loads of money in the idea of being the best.

2

salgat t1_je3eqx5 wrote

GPT4 is the world's best googler. As long as a similar solution existed on the internet in the past, there's a good chance GPT4 can pick it up, even if it's not on leetcode yet.

1

Nhabls t1_je93uvg wrote

The way they defined human performance there is just funny.

Dividing the number of accepted answers by total users.. might as well just make up a number

1