Viewing a single comment thread. View all comments

enryu42 OP t1_jdrzcc9 wrote

Do you mean re-prompt it asking to correct its mistakes? It is hard to try with the current tight limits on GPT4 prompt count, I'll try once API is properly available. But I strongly doubt it'll help much: it's not that the solutions have minor bugs, they're usually just completely wrong, i.e. the model doesn't "get" the idea for the correct solution.

(it might help for some of the problems from the "Beginner" category though, but these aren't that interesting)

6

ghostfaceschiller t1_jds202e wrote

Yeah it's essentially that at an automated level. Tbh it is powerful enough based on results so far that would actually be really surprised if it did not yield very significant gains in these tests.

I'm sure there will be a paper out doing it in like the next few days, so we'll see

15

Jeffy29 t1_jdsm90r wrote

>But I strongly doubt it'll help much: it's not that the solutions have minor bugs, they're usually just completely wrong

I strongly doubt that it wouldn't help. I haven't tested GPT-4 in coding but from what I've seen GPT-3 makes a number of simple errors, especially in longer complex code it's almost inevitable. But it's able to quickly identify and correct it when you point it out. GPT-4 not being able to compile and test its own code that is a big limitation that humans don't have. It also can't calculate the math, it's essentially guessing the calculation, but both can be addressed with an external compiler and calculator like Wolfram. Something humans also have access to. There would need to be some time limit imposed so it can't brute force the solution after guessing for a few days but even so I think the improvements would be quite large.

3

sdmat t1_jdt9ik9 wrote

> There would need to be some time limit imposed so it can't brute force the solution after guessing for a few days

Not exactly unheard of for junior programmers, to be fair.

3

farmingvillein t1_jdsdalw wrote

> Do you mean re-prompt it asking to correct its mistakes?

Well, re-prompt + asking it to bake test cases upfront and continuously analyze how failures line up with the test cases.

1