Submitted by noellarkin t3_10rhprm in MachineLearning
Franck_Dernoncourt t1_j6ydkiu wrote
> I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.
on which tasks?
> Of course, I didn't expect the smaller models to be on par with GPT-3
You could read Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto. Benchmarking Large Language Models for News Summarization. arXiv:2301.13848.:
> we find instruction tuning, and not model size, is the key to the LLM’s zero-shot summarization capability
Viewing a single comment thread. View all comments