Viewing a single comment thread. View all comments

michaelthwan_ai OP t1_jdlf8g8 wrote

Because the recent release of LLMs has been too vigorous, I organized recent notable models from the news. Some may find the diagram useful, so please allow me to distribute it.

Please let me know if there is anything I should change or add so that I can learn. Thank you very much.

If you want to edit or create an issue, please use this repo.

---------EDIT 20230326

Thank you for your responses, I've learnt a lot. I have updated the chart:

Changes 20230326:

  • Added: OpenChatKit, Dolly and their predecessors
  • More high-res

To learn:

  • RWKV/ChatRWKV related, PaLM-rlhf-pytorch

Models that not considered (yet)

  • Models that is <= 2022 (e.g. T5 (2022May). This post is created to help people quickly gather information about new models)
  • Models that is not fully released yet (e.g. Bard, under limited review)

gopher9 t1_jdlq1jy wrote



Puzzleheaded_Acadia1 t1_jdlx1g3 wrote

What is RWKV?


Rejg t1_jdmdspx wrote

I think you are potentially missing Claude 1.0 and Claude 1.2, the Co:Here Suite, and Google Flan models.


ganzzahl t1_jdouip7 wrote

You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.

I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.


signed7 t1_jdqm8lt wrote

I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only