--algo t1_j40j84w wrote on January 12, 2023 at 9:28 AM

We are both right and wrong. To be pedantic, it's this paper for both https://arxiv.org/abs/2203.02155 but with different training data

Hyper1on t1_j43crwx wrote on January 12, 2023 at 9:58 PM

That's the InstructGPT paper, which is right for ChatGPT, but Copilot is based on Codex, which does not use RLHF.

But maybe it's only for the non-codex models

Copilot itself is the 12B Codex model, with further refinements.