michaelthwan_ai OP t1_jdlf8g8 wrote on March 25, 2023 at 6:55 AM

Because the recent release of LLMs has been too vigorous, I organized recent notable models from the news. Some may find the diagram useful, so please allow me to distribute it.

Please let me know if there is anything I should change or add so that I can learn. Thank you very much.

If you want to edit or create an issue, please use this repo.

---------EDIT 20230326

Thank you for your responses, I've learnt a lot. I have updated the chart:

https://github.com/michaelthwan/llm_family_chart/blob/master/LLMfamily2023Mar.drawio.png
(Look like I cannot edit the post)

Changes 20230326:

Added: OpenChatKit, Dolly and their predecessors
More high-res

To learn:

RWKV/ChatRWKV related, PaLM-rlhf-pytorch

Models that not considered (yet)

Models that is <= 2022 (e.g. T5 (2022May). This post is created to help people quickly gather information about new models)
Models that is not fully released yet (e.g. Bard, under limited review)

gopher9 t1_jdlq1jy wrote on March 25, 2023 at 9:35 AM

Add RWKV.

Puzzleheaded_Acadia1 t1_jdlx1g3 wrote on March 25, 2023 at 11:13 AM

What is RWKV?

fv42622 t1_jdm3vtm wrote on March 25, 2023 at 12:29 PM

https://github.com/BlinkDL/RWKV-LM

Puzzleheaded_Acadia1 t1_jdn4sly wrote on March 25, 2023 at 5:15 PM

So from what I understand it faster the gpt and saves more VRAM and it can run on gpu what else did I miss

DigThatData t1_jdpza0l wrote on March 26, 2023 at 7:38 AM

it's an RNN

michaelthwan_ai OP t1_jdpyb80 wrote on March 26, 2023 at 7:24 AM

added in backlog. Need some time to study. Thanks.

Rejg t1_jdmdspx wrote on March 25, 2023 at 1:58 PM

I think you are potentially missing Claude 1.0 and Claude 1.2, the Co:Here Suite, and Google Flan models.

ganzzahl t1_jdouip7 wrote on March 26, 2023 at 12:50 AM

You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.

I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.

signed7 t1_jdqm8lt wrote on March 26, 2023 at 12:45 PM

I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only

maizeq t1_jdlzhql wrote on March 25, 2023 at 11:42 AM

Would be useful to distinguish between SFT and RLHF tuned models

[deleted] t1_jdpz1d5 wrote on March 26, 2023 at 7:34 AM

[deleted]

addandsubtract t1_jdlvmm6 wrote on March 25, 2023 at 10:54 AM

Where does GPT-J and dolly fall into this?

wywywywy t1_jdlz40b wrote on March 25, 2023 at 11:37 AM

GPT-J & GPT-Neo are predecessors of GPT-NeoX 20b

michaelthwan_ai OP t1_jdlzwyi wrote on March 25, 2023 at 11:47 AM

Sure I think it is clear enough to show parents of recent model (instead of their grand grand grand parents..

If people want, I may consider to make a full one (including older one)

wywywywy t1_jdm16va wrote on March 25, 2023 at 12:01 PM

In my opinion, it'd be better to include only the currently relevant ones rather than everything under the sun.

Too much noise makes the chart less useful.

michaelthwan_ai OP t1_jdm3sgb wrote on March 25, 2023 at 12:28 PM

Agreed

Puzzleheaded_Acadia1 t1_jdlx6ea wrote on March 25, 2023 at 11:14 AM

Is gpt-j 6b really better than alpaca 7b and Wich run faster

DigThatData t1_jdmvquq wrote on March 25, 2023 at 4:10 PM

the fact that it's comparable at all is pretty wild and exciting

StellaAthena t1_jdotklz wrote on March 26, 2023 at 12:42 AM

It’s somewhat worse and a little faster.

[deleted] t1_jdlzp51 wrote on March 25, 2023 at 11:44 AM

[deleted]

michaelthwan_ai OP t1_jdlztvv wrote on March 25, 2023 at 11:46 AM

It is a good model but it's about one year ago, and not related to recent released LLM. Therefore I didn't add (otherwise a tons of good models).
For dolly, it is just ytd. I didn't have full info of it yet

addandsubtract t1_jdm1d9h wrote on March 25, 2023 at 12:03 PM

Ok, no worries. I'm just glad there's a map to guide the madness going on, atm. Adding legacy models would be good for people who come across them now, to know that they are legacy.

DigThatData t1_jdmvjyb wrote on March 25, 2023 at 4:09 PM

dolly is important precisely because the foundation model is old. they were able to get chatgpt level performance out of it and they only trained it for three hours. just because the base model is old doesn't mean this isn't recent research. it demonstrates:

the efficacy of instruct finetuning
that instruct finetuning doesn't require the worlds biggest most modern model or even all that much data

dolly isn't research from a year ago, it was only just described for the first time a few days ago.

EDIT: ok I just noticed you have an ERNIE model up there so this "no old foundation models" thing is just inconsistent.

Historical-Tree9132 t1_jdlo76y wrote on March 25, 2023 at 9:08 AM

miss the dataset flow arrow to China-related model..

michaelthwan_ai OP t1_jdlzq6i wrote on March 25, 2023 at 11:45 AM

I considered haha but I have no evidence

light24bulbs t1_jdm04sx wrote on March 25, 2023 at 11:49 AM

Are those it? Surely there's a bunch more notable open source ones?

michaelthwan_ai OP t1_jdm3v2y wrote on March 25, 2023 at 12:29 PM

Please suggest so.

philipgutjahr t1_jdn2p2u wrote on March 25, 2023 at 5:00 PM

from https://www.reddit.com/r/MachineLearning/comments/11uk8ti/d_totally_open_alternatives_to_chatgpt/

OpenChatKit (based on GPT-NeoX-20B) https://www.together.xyz/blog/openchatkit

Instruct-GPT https://carper.ai/instruct-gpt-announcement/

michaelthwan_ai OP t1_jdpxtp1 wrote on March 26, 2023 at 7:17 AM

Open alternative -> added most and so is in TODO (e.g. palm)
OpenChatKit -> added
Instruct-GPT -> seems it's not a released model but plan.

philipgutjahr t1_jdq43bq wrote on March 26, 2023 at 8:48 AM

not sure if this is true, but afaik chat-gpt is basically a implementation of instruct-gpt (where OpenAI have been very thoroughly at RLHF)

"instance of" https://nextword.dev/blog/chatgpt-instructgpt-gpt3-explained-in-plain-english

"sibbling but a lot better" https://openai.com/blog/chatgpt

Small-Fall-6500 t1_jdmftp5 wrote on March 25, 2023 at 2:14 PM

RWKV: main RWKV GitHub and the ChatRWKV GitHub

And here is a list of useful open or mostly open chatgpt-like projects/models

michaelthwan_ai OP t1_jdpxuzs wrote on March 26, 2023 at 7:17 AM

Chatgpt-like github -> added most and so is in TODO (e.g. palm)

RWKV -> added in backlog

philipgutjahr t1_jdn95o8 wrote on March 25, 2023 at 5:45 PM

for completeness, you should also add all those proprietary models: Megatron-Turing (530B, NVIDIA), Gopher (280B, Google), Chinchilla (70B, DeepMind) and Chatgenie (WriteCream)

michaelthwan_ai OP t1_jdpy06p wrote on March 26, 2023 at 7:19 AM

I only include recent LLM (Feb/Mar 2023) (that is the LLMs usually at the bottom) and 2-factor predecessors (parent/grandparent). See if your mentioned one is related to them.

DigThatData t1_jdmv87n wrote on March 25, 2023 at 4:07 PM

don't forget Dolly, the databricks model that was successfully instruct-finetuned on gpt-j-6b in 3 hours

michaelthwan_ai OP t1_jdpy0l2 wrote on March 26, 2023 at 7:19 AM

added, thank you.

Ph0masta t1_jdn5o16 wrote on March 25, 2023 at 5:21 PM

Where does Google’s LAMDA fit on this chart?

StellaAthena t1_jdotc87 wrote on March 26, 2023 at 12:40 AM

It’s it’s own block not connected to anything

michaelthwan_ai OP t1_jdpy7yf wrote on March 26, 2023 at 7:22 AM

I may include BARD if it is fully released.

So LAMDA->BARD (maybe). But it is still in alpha/beta.

[deleted] t1_jdqmk5x wrote on March 26, 2023 at 12:49 PM

[deleted]

_underlines_ t1_jdo0o85 wrote on March 25, 2023 at 9:04 PM

nice one.

i would add the natively trained alpaca models, which exist besides alpaca-lora. see my model card for this:

https://github.com/underlines/awesome-marketing-datascience/blob/master/llama.md#3rd-party-llama-and-alpaca-models

and here's an overview of almost every LLM under the sun:

https://docs.google.com/spreadsheets/d/1O5KVQW1Hx5ZAkcg8AIRjbQLQzx2wVaLl0SqUu-ir9Fs/edit#gid=1158069878

DarkTarantino t1_jdly8y7 wrote on March 25, 2023 at 11:27 AM

If I wanted to create graphs like these for work what is that role called

heuboi t1_jdm4h0t wrote on March 25, 2023 at 12:35 PM

Powerpoint engineer

ZestyData t1_jdmdjrd wrote on March 25, 2023 at 1:56 PM

Well.. you can just create these graphs if its important for your current task.

There isn't a role called "Chief graph maker" who makes graphs for people when they need them.

DarkTarantino t1_jdmgpqq wrote on March 25, 2023 at 2:21 PM

🤣🤣 shiiit at this point you never know

Veggies-are-okay t1_jdmopy0 wrote on March 25, 2023 at 3:21 PM

Does anyone have a good resource/video on the overview of these implementations? I don’t work much with language models but figure it might be good to understand where this is but I’m just running into the buzz feed-esque surface level nonsense on YouTube.

tonicinhibition t1_jdn4v86 wrote on March 25, 2023 at 5:16 PM

There's a YouTuber named Letitia, with a little Miss Coffee Bean character, who covers new models at a decent level.

CodeEmporium does a great job at introducing aspects of the GPT/ChatGPT architecture with increasing depth. Some of the videos have code.

Andrej Karpathy walks you through building GPT in code

As for the lesser known models, I just read the abstracts and skim the papers. It's a lot of the same stuff with slight variations.

michaelthwan_ai OP t1_jdpy5dy wrote on March 26, 2023 at 7:21 AM

Thanks for the sharing above!

My choice is yk - Yannic Kilcher. Some "AI News" videos is a brief introduction and he sometimes go through certain papers in details. Very insightful!

big_ol_tender t1_jdoe9k1 wrote on March 25, 2023 at 10:47 PM

Alpaca dataset is not open source so alpaca-lora is not open source.

[deleted] t1_jdpl0gb wrote on March 26, 2023 at 4:36 AM

[deleted]

Comments