adt t1_j9neq5w wrote
There should be quite a few models smaller than 15M params. What's your use case? A lot of the 2022-2023 optimizations mean that you can squish models onto modern GPUs now (i.e. int8 etc.).
Designed to be fit onto a standard GPU, DeepMind Gato was bigger than I thought, with starting size of 79M params.
Have you found the BERT compression paper, which compresses the models to 7MB? It lists some 1.2M-6.2M param models:
https://arxiv.org/pdf/1909.11687.pdf
My table shows...
*looks at table*
Smallest seems to be Microsoft Pact, which was ~30M params. Ignore that! Transformer is supposed to be wide and deep, I suppose, so it makes sense...
Many of the text-to-image models use smaller LLMs.
Also check HF, they now have 130,000 models of different sizes (to Feb/2023):
Includes a tiny-gpt2: https://huggingface.co/sshleifer/tiny-gpt2
And t5-efficient tiny ('has 15.58 million parameters and thus requires ca. 62.32 MB of memory in full precision (fp32) or 31.16 MB of memory in half precision (fp16 or bf16).'):
https://huggingface.co/google/t5-efficient-tiny
Edit: I thought of Anthropic's toy models, but they were not really LLMs. They did train a 10M model during scaling research (paper), but the model hasn't been released.
Seankala OP t1_j9npctd wrote
Thanks for the detailed answer! My use case is that the company I work at currently uses image-based models for e-commerce purposes, but we want to use text-based models as well. The image-based model(s) are already taking up around 30-50M parameters so I didn't want to just bring in a 100M+ parameter model. Even 15M seems quite big.
currentscurrents t1_j9nqcno wrote
What are you trying to do? Most of the cool features of language models only emerge at much larger scales.
Seankala OP t1_j9nqmf5 wrote
That's true for all of the models. I don't really need anything cool though, all I need is a solid model that can perform simple tasks like text classification or NER well.
currentscurrents t1_j9nr930 wrote
Might want to look into something like https://spacy.io/
Friktion t1_j9oxnz6 wrote
I have some experience with FastText for e-commerce product classification. Its super lightweight and performs well as a MVP.
cantfindaname2take t1_j9qohtm wrote
Yeah, I would try FastText before LLMs.
cantfindaname2take t1_j9qov0f wrote
For simple NER tasks some simpler models might work too,like conditional random fields. The crfsuite package has a very easy to use implementation of it and it is using a C lib under the hood for the model training.
chogall t1_j9nioqb wrote
> but they were not really LLMs.
What's the definition of LLM?
Seankala OP t1_j9np9ae wrote
I guess at least 100M+ parameters? I like to think of the BERT-base model as being the "starting point" of LLMs.
FluffyVista t1_j9ottk1 wrote
probably
Yahentamitsi t1_j9xi4q4 wrote
That's a good question! I'm not sure if there are any pretrained language models with fewer parameters, but you could always try training your own model from scratch and see how small you can get it.
Viewing a single comment thread. View all comments