Viewing a single comment thread. View all comments

Lawjarp2 t1_jcb2j0i wrote

It scores at the 5th percentile on codeforces. It can barely solve medium hard questions on leetcode.

Most software development doesn't need one to be good at anything mentioned above. But they do indicate ones ability to do leap of logic required to solve something like AGI. GPT-4 is not ready for that yet.

14

throwawaydthrowawayd t1_jcb48bo wrote

Unfortunately, they didn't tell us anything about how they did the codeforces test. It sounds like they just tried zero-shot, had GPT-4 see the problem and immediately write code to solve it. But that's not humans solve codeforces problems, we sit down and think through the problem. In a more real world scenario, I think GPT-4 would do way better at codeforces. Still not as good as a human, but definitely way better than their test.

12

SoylentRox t1_jcb6ljc wrote

They could fine tune it, use prompting or multiple pass reasoning, give it an internal python interpreter. Lots of options that would more fairly produce results closer to what this generation of compute plus model architecture is capable of.

I don't know how well that will do but i expect better than median human as these are the result google got who were using a weaker model than gpt-4.

6

MustacheEmperor t1_jcc5crl wrote

Our CTO and I tried getting it to write some relatively challenging Swift as a benchmark example and it just repeatedly botched it. It would produce something close to working code, but kept insisting on using libraries that didn't have support for what it was trying to do with them, which was also an issue with 3.5.

3