wojapa t1_jdl23pj wrote on March 25, 2023 at 4:17 AM

Did they use RLHF?

Vegetable-Skill-9700 OP t1_jdl2fbp wrote on March 25, 2023 at 4:20 AM

I think it's just supervised training. Similar to alpaca, I guess

[removed]

GPT-J-6B fine tuned on Alpaca’s instruction dataset.

No, check their git repo. They used HF transformer's AutoFromCausalLM in their training script. It's supervised fine-tuning.