Submitted by michaelthwan_ai t3_121domd in MachineLearning
Comments
gopher9 t1_jdlq1jy wrote
Add RWKV.
Puzzleheaded_Acadia1 t1_jdlx1g3 wrote
What is RWKV?
fv42622 t1_jdm3vtm wrote
Puzzleheaded_Acadia1 t1_jdn4sly wrote
So from what I understand it faster the gpt and saves more VRAM and it can run on gpu what else did I miss
DigThatData t1_jdpza0l wrote
it's an RNN
michaelthwan_ai OP t1_jdpyb80 wrote
added in backlog. Need some time to study. Thanks.
Rejg t1_jdmdspx wrote
I think you are potentially missing Claude 1.0 and Claude 1.2, the Co:Here Suite, and Google Flan models.
ganzzahl t1_jdouip7 wrote
You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.
I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.
signed7 t1_jdqm8lt wrote
I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only
maizeq t1_jdlzhql wrote
Would be useful to distinguish between SFT and RLHF tuned models
[deleted] t1_jdpz1d5 wrote
[deleted]
addandsubtract t1_jdlvmm6 wrote
Where does GPT-J and dolly fall into this?
wywywywy t1_jdlz40b wrote
GPT-J & GPT-Neo are predecessors of GPT-NeoX 20b
michaelthwan_ai OP t1_jdlzwyi wrote
Sure I think it is clear enough to show parents of recent model (instead of their grand grand grand parents..
If people want, I may consider to make a full one (including older one)
wywywywy t1_jdm16va wrote
In my opinion, it'd be better to include only the currently relevant ones rather than everything under the sun.
Too much noise makes the chart less useful.
michaelthwan_ai OP t1_jdm3sgb wrote
Agreed
Puzzleheaded_Acadia1 t1_jdlx6ea wrote
Is gpt-j 6b really better than alpaca 7b and Wich run faster
DigThatData t1_jdmvquq wrote
the fact that it's comparable at all is pretty wild and exciting
StellaAthena t1_jdotklz wrote
It’s somewhat worse and a little faster.
[deleted] t1_jdlzp51 wrote
[deleted]
michaelthwan_ai OP t1_jdlztvv wrote
It is a good model but it's about one year ago, and not related to recent released LLM. Therefore I didn't add (otherwise a tons of good models).
For dolly, it is just ytd. I didn't have full info of it yet
addandsubtract t1_jdm1d9h wrote
Ok, no worries. I'm just glad there's a map to guide the madness going on, atm. Adding legacy models would be good for people who come across them now, to know that they are legacy.
DigThatData t1_jdmvjyb wrote
dolly is important precisely because the foundation model is old. they were able to get chatgpt level performance out of it and they only trained it for three hours. just because the base model is old doesn't mean this isn't recent research. it demonstrates:
- the efficacy of instruct finetuning
- that instruct finetuning doesn't require the worlds biggest most modern model or even all that much data
dolly isn't research from a year ago, it was only just described for the first time a few days ago.
EDIT: ok I just noticed you have an ERNIE model up there so this "no old foundation models" thing is just inconsistent.
Historical-Tree9132 t1_jdlo76y wrote
miss the dataset flow arrow to China-related model..
michaelthwan_ai OP t1_jdlzq6i wrote
I considered haha but I have no evidence
light24bulbs t1_jdm04sx wrote
Are those it? Surely there's a bunch more notable open source ones?
michaelthwan_ai OP t1_jdm3v2y wrote
Please suggest so.
philipgutjahr t1_jdn2p2u wrote
from https://www.reddit.com/r/MachineLearning/comments/11uk8ti/d_totally_open_alternatives_to_chatgpt/
OpenChatKit (based on GPT-NeoX-20B) https://www.together.xyz/blog/openchatkit
Instruct-GPT https://carper.ai/instruct-gpt-announcement/
michaelthwan_ai OP t1_jdpxtp1 wrote
Open alternative -> added most and so is in TODO (e.g. palm)
OpenChatKit -> added
Instruct-GPT -> seems it's not a released model but plan.
philipgutjahr t1_jdq43bq wrote
not sure if this is true, but afaik chat-gpt is basically a implementation of instruct-gpt (where OpenAI have been very thoroughly at RLHF)
"instance of" https://nextword.dev/blog/chatgpt-instructgpt-gpt3-explained-in-plain-english
"sibbling but a lot better" https://openai.com/blog/chatgpt
Small-Fall-6500 t1_jdmftp5 wrote
michaelthwan_ai OP t1_jdpxuzs wrote
Chatgpt-like github -> added most and so is in TODO (e.g. palm)
RWKV -> added in backlog
philipgutjahr t1_jdn95o8 wrote
for completeness, you should also add all those proprietary models: Megatron-Turing (530B, NVIDIA), Gopher (280B, Google), Chinchilla (70B, DeepMind) and Chatgenie (WriteCream)
michaelthwan_ai OP t1_jdpy06p wrote
I only include recent LLM (Feb/Mar 2023) (that is the LLMs usually at the bottom) and 2-factor predecessors (parent/grandparent). See if your mentioned one is related to them.
DigThatData t1_jdmv87n wrote
don't forget Dolly, the databricks model that was successfully instruct-finetuned on gpt-j-6b in 3 hours
michaelthwan_ai OP t1_jdpy0l2 wrote
added, thank you.
Ph0masta t1_jdn5o16 wrote
Where does Google’s LAMDA fit on this chart?
StellaAthena t1_jdotc87 wrote
It’s it’s own block not connected to anything
michaelthwan_ai OP t1_jdpy7yf wrote
I may include BARD if it is fully released.
So LAMDA->BARD (maybe). But it is still in alpha/beta.
[deleted] t1_jdqmk5x wrote
[deleted]
_underlines_ t1_jdo0o85 wrote
nice one.
i would add the natively trained alpaca models, which exist besides alpaca-lora. see my model card for this:
and here's an overview of almost every LLM under the sun:
DarkTarantino t1_jdly8y7 wrote
If I wanted to create graphs like these for work what is that role called
heuboi t1_jdm4h0t wrote
Powerpoint engineer
ZestyData t1_jdmdjrd wrote
Well.. you can just create these graphs if its important for your current task.
There isn't a role called "Chief graph maker" who makes graphs for people when they need them.
DarkTarantino t1_jdmgpqq wrote
🤣🤣 shiiit at this point you never know
Veggies-are-okay t1_jdmopy0 wrote
Does anyone have a good resource/video on the overview of these implementations? I don’t work much with language models but figure it might be good to understand where this is but I’m just running into the buzz feed-esque surface level nonsense on YouTube.
tonicinhibition t1_jdn4v86 wrote
There's a YouTuber named Letitia, with a little Miss Coffee Bean character, who covers new models at a decent level.
CodeEmporium does a great job at introducing aspects of the GPT/ChatGPT architecture with increasing depth. Some of the videos have code.
Andrej Karpathy walks you through building GPT in code
As for the lesser known models, I just read the abstracts and skim the papers. It's a lot of the same stuff with slight variations.
michaelthwan_ai OP t1_jdpy5dy wrote
Thanks for the sharing above!
My choice is yk - Yannic Kilcher. Some "AI News" videos is a brief introduction and he sometimes go through certain papers in details. Very insightful!
big_ol_tender t1_jdoe9k1 wrote
Alpaca dataset is not open source so alpaca-lora is not open source.
[deleted] t1_jdpl0gb wrote
[deleted]
michaelthwan_ai OP t1_jdlf8g8 wrote
Because the recent release of LLMs has been too vigorous, I organized recent notable models from the news. Some may find the diagram useful, so please allow me to distribute it.
Please let me know if there is anything I should change or add so that I can learn. Thank you very much.
If you want to edit or create an issue, please use this repo.
---------EDIT 20230326
Thank you for your responses, I've learnt a lot. I have updated the chart:
Changes 20230326:
To learn:
Models that not considered (yet)