danilo62 OP t1_jeextek wrote on March 31, 2023 at 2:56 PM

Reply to comment by -pkomlytyrg in [D] Best deal with varying number of inputs each with variable size using and RNN? (for an NLP task) by danilo62

Oh, so those models are able to produce fixed size embeddings of texts? I wasn't aware of that

-pkomlytyrg t1_jef4weq wrote on March 31, 2023 at 3:42 PM

Generally, yes. If you use a model with a long context length (BigBird or OpenAI’s ada02), you’ll likely be fine unless the articles you’re embedding are greater than the token limit. If your using BERT or another, smaller model, you have to chunk/average; that can produce fixed sized vectors but you gotta put the work in haha

danilo62 OP t1_jefnror wrote on March 31, 2023 at 5:45 PM

Yeah I'm gonna try both options (with BERT and the bigger models) but since I'm working with a big dataset I'm not sure I'll be able to use the larger models due to the token and request limits. Thanks for the help