Franck_Dernoncourt t1_j6ydkiu wrote on February 2, 2023 at 7:37 PM

> I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

on which tasks?

> Of course, I didn't expect the smaller models to be on par with GPT-3

You could read Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto. Benchmarking Large Language Models for News Summarization. arXiv:2301.13848.:

> we find instruction tuning, and not model size, is the key to the LLM’s zero-shot summarization capability