sineiraetstudio
sineiraetstudio t1_jdwvmxb wrote
Reply to comment by meister2983 in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Are you talking about RLHF in general? I'm specifically referring to the calibration error, which is separate from accuracy.
sineiraetstudio t1_jdws5js wrote
Reply to comment by IDe- in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
All higher-order markov chains can be modeled as a first-order markov chain by squashing states together.
sineiraetstudio t1_jdws2iv wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
(The graph doesn't give enough information to determine whether it's actually becoming more confident in its high-confidence answers, but it sounds like a reasonable enough rationale.)
I'm not sure I understand what distinction you're trying to draw. The RLHF'd version assigns higher confidence to answers than it actually gets correct, unlike the original pre-trained version. That's literally the definition of overconfidence.
You might say that this is more "human-like", but being human-like doesn't mean that it's good. If you want only the most likely answer, you can already do this via the sampler, while on the hand calibration errors are a straight up downside as Paul Christiano explicitly mentions in the part you quoted. If you need accurate confidence scores (because you e.g. only want to act if you're certain), being well-calibrated is essential.
sineiraetstudio t1_jdwbuig wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
I don't see how this is arguing it's a good thing, it's just a justification (which I'd expect from Paul Christiano, he's a huge fan of RLHF). The model is becoming overconfident in it's answers - how could you possibly spin that as a positive?
sineiraetstudio t1_jdvvvdb wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
... that's not what's happening though? The calibration error is causing it to increase its confidence in low accuracy answer and decrease it in med-high accuracy answers, making it more likely to output wrong answers. Seems like maybe you're confusing it with using a different sampler? Something like top-p already does what you mentioned.
sineiraetstudio t1_jdymf8q wrote
Reply to comment by was_der_Fall_ist in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9
Oh, RLHF absolutely has all sorts of benefits (playing with top-p only makes answers more consistent - but sometimes you want to optimize for something different than "most likely"), so it's definitely here to stay (for now?), it's just not purely positive. Ideally we'd have a RLHF version that's still well calibrated (or even better, some way to determine confidence without looking at logits that also works with chain of thought prompting).