cfoster0 t1_j4alveu wrote on January 14, 2023 at 9:29 AM

Reply to [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont

FWIW in certain sense this goes against the design philosophy of transformers, which is to jointly compute all representations within a layer at once, to maximize the degree of parallelism on GPUs and other accelerators.

cfoster0 t1_izuxn52 wrote on December 12, 2022 at 1:00 AM

Reply to comment by FerretDude in [R] Illustrating Reinforcement Learning from Human Feedback (RLHF) by robotphilanthropist

Did y'all stop doing work out in the open? That's a shame. End of an era, I guess.

cfoster0 t1_izrdeii wrote on December 11, 2022 at 7:07 AM

Reply to comment by FerretDude in [R] Illustrating Reinforcement Learning from Human Feedback (RLHF) by robotphilanthropist

Who? Who's even using RLHF in production yet, besides OpenAI (and maybe Cohere)?

cfoster0 t1_izlys6v wrote on December 10, 2022 at 2:01 AM

Reply to [R] Illustrating Reinforcement Learning from Human Feedback (RLHF) by robotphilanthropist

About this bit

> At the moment, TRLX has an API capable of production-ready RLHF at the scales required for LLM deployment (e.g. 33 billion parameters). Future versions of TRLX will allow for language models up to 200B parameters. As such, interfacing with TRLX is optimized for machine learning engineers with experience at this scale.

Has TRLX been used to tune models in production already? Or if not, what did the blog post mean by "capable of production-ready RLHF"? I haven't seen any RLHF-ed models built on open source software yet, much less a 33B parameter one.

EDIT: Also hi @FerretDude