gopher9 t1_jdlq1jy wrote on March 25, 2023 at 9:35 AM

Add RWKV.

Puzzleheaded_Acadia1 t1_jdlx1g3 wrote on March 25, 2023 at 11:13 AM

What is RWKV?

fv42622 t1_jdm3vtm wrote on March 25, 2023 at 12:29 PM

https://github.com/BlinkDL/RWKV-LM

Puzzleheaded_Acadia1 t1_jdn4sly wrote on March 25, 2023 at 5:15 PM

So from what I understand it faster the gpt and saves more VRAM and it can run on gpu what else did I miss

DigThatData t1_jdpza0l wrote on March 26, 2023 at 7:38 AM

it's an RNN

michaelthwan_ai OP t1_jdpyb80 wrote on March 26, 2023 at 7:24 AM

added in backlog. Need some time to study. Thanks.

Rejg t1_jdmdspx wrote on March 25, 2023 at 1:58 PM

I think you are potentially missing Claude 1.0 and Claude 1.2, the Co:Here Suite, and Google Flan models.

ganzzahl t1_jdouip7 wrote on March 26, 2023 at 12:50 AM

You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.

I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.

signed7 t1_jdqm8lt wrote on March 26, 2023 at 12:45 PM

I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only

maizeq t1_jdlzhql wrote on March 25, 2023 at 11:42 AM

Would be useful to distinguish between SFT and RLHF tuned models

[deleted] t1_jdpz1d5 wrote on March 26, 2023 at 7:34 AM

[deleted]

[N] March 2023 - Recent Instruction/Chat-Based Models and their parents

michaelthwan_ai OP t1_jdlf8g8 wrote on March 25, 2023 at 6:55 AM