Seankala OP t1_j9npctd wrote on February 23, 2023 at 7:25 AM

Thanks for the detailed answer! My use case is that the company I work at currently uses image-based models for e-commerce purposes, but we want to use text-based models as well. The image-based model(s) are already taking up around 30-50M parameters so I didn't want to just bring in a 100M+ parameter model. Even 15M seems quite big.

currentscurrents t1_j9nqcno wrote on February 23, 2023 at 7:38 AM

What are you trying to do? Most of the cool features of language models only emerge at much larger scales.

Seankala OP t1_j9nqmf5 wrote on February 23, 2023 at 7:41 AM

That's true for all of the models. I don't really need anything cool though, all I need is a solid model that can perform simple tasks like text classification or NER well.

currentscurrents t1_j9nr930 wrote on February 23, 2023 at 7:49 AM

Might want to look into something like https://spacy.io/

Friktion t1_j9oxnz6 wrote on February 23, 2023 at 3:18 PM

I have some experience with FastText for e-commerce product classification. Its super lightweight and performs well as a MVP.

cantfindaname2take t1_j9qohtm wrote on February 23, 2023 at 9:48 PM

Yeah, I would try FastText before LLMs.

cantfindaname2take t1_j9qov0f wrote on February 23, 2023 at 9:50 PM

For simple NER tasks some simpler models might work too,like conditional random fields. The crfsuite package has a very easy to use implementation of it and it is using a C lib under the hood for the model training.

chogall t1_j9nioqb wrote on February 23, 2023 at 6:06 AM

> but they were not really LLMs.

What's the definition of LLM?

Seankala OP t1_j9np9ae wrote on February 23, 2023 at 7:24 AM

I guess at least 100M+ parameters? I like to think of the BERT-base model as being the "starting point" of LLMs.

FluffyVista t1_j9ottk1 wrote on February 23, 2023 at 2:51 PM

probably

Yahentamitsi t1_j9xi4q4 wrote on February 25, 2023 at 7:11 AM

That's a good question! I'm not sure if there are any pretrained language models with fewer parameters, but you could always try training your own model from scratch and see how small you can get it.

[D] 14.5M-15M is the smallest number of parameters I could find for current pretrained language models. Are there any that are smaller?

adt t1_j9neq5w wrote on February 23, 2023 at 5:25 AM