TheEdes t1_je149kf wrote on March 28, 2023 at 5:34 PM

Reply to comment by MrFlamingQueen in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Yeah but if you were to come up with a problem in your head that didn't exist word for word then GPT-4 would be doing what they're advertising, however, if the problem was word for word anywhere in the training data then the testing data is contaminated. If the model can learn the design patterns for leetcode style questions by looking at examples of them, then it's doing something really good, if it can only solve problems that it has seen before, then it's nothing special, they just overfit a trillion parameters on a comparatively very small dataset.

cegras t1_je2k9dr wrote on March 28, 2023 at 11:09 PM

ChatGPT is great at learning the nuances of english, i.e. synonyms and metaphors. But if you feed it a reworded leetcode question and it finds the answer within its neural net, has it learned to conceptualize? No, it just learned that synonym ...

TheEdes t1_je6tweq wrote on March 29, 2023 at 8:46 PM

Sure but what's being advertised isn't sentience per se, at least with the leetcode part of their benchmarks. The issue here is that they claim that it can do X% on leetcode, but it seems like it's much less on new data. Even if it learned to find previous solutions and replace it with changes it should be able to perform well due to the nature of the problems.

MrFlamingQueen t1_je3kywp wrote on March 29, 2023 at 3:54 AM

Agreed. It's very likely contamination. Even "new" LeetCode problems existed before they were published on the website.