danilo62 OP t1_jeextek wrote
Reply to comment by -pkomlytyrg in [D] Best deal with varying number of inputs each with variable size using and RNN? (for an NLP task) by danilo62
Oh, so those models are able to produce fixed size embeddings of texts? I wasn't aware of that
-pkomlytyrg t1_jef4weq wrote
Generally, yes. If you use a model with a long context length (BigBird or OpenAI’s ada02), you’ll likely be fine unless the articles you’re embedding are greater than the token limit. If your using BERT or another, smaller model, you have to chunk/average; that can produce fixed sized vectors but you gotta put the work in haha
danilo62 OP t1_jefnror wrote
Yeah I'm gonna try both options (with BERT and the bigger models) but since I'm working with a big dataset I'm not sure I'll be able to use the larger models due to the token and request limits. Thanks for the help
Viewing a single comment thread. View all comments