BSartish t1_jciy4nt wrote on March 17, 2023 at 4:01 AM

Reply to comment by liright in Those who know... by Destiny_Knight

This video explains it pretty well.

ThatInternetGuy t1_jcj2ew8 wrote on March 17, 2023 at 4:44 AM

Why didn't they train once more with ChatGPT instruct data? Should cost them $160 in total.

CellWithoutCulture t1_jcjkwy1 wrote on March 17, 2023 at 8:48 AM

Most likely they haven't had time.

They can also use SHP and HF-RLHF.... I think they will help a lot since LLaMA didn't get the privlidge of reading reddit (unliked ChatGPT)

ThatInternetGuy t1_jckmq5s wrote on March 17, 2023 at 2:58 PM

>HF-RLHF

Probably no need, since this model could piggyback on the responses generated from GPT4, so it should carry the trait of the GPT4 model with RLHF, shouldn't it?

CellWithoutCulture t1_jcmsxjq wrote on March 17, 2023 at 11:37 PM

HF-RLHF is the name of the dataset. As far as RLHF... what they did to LLaMA is called "Knowledge Distillation" and iirc usually isn't quite as good as RLHF. It's an approximation.

cartmanOne t1_jcof3eq wrote on March 18, 2023 at 9:30 AM

That’s for their next paper…

CellWithoutCulture t1_jcjkycz wrote on March 17, 2023 at 8:49 AM

decent video