ganzzahl t1_jdouip7 wrote on March 26, 2023 at 12:50 AM

Reply to comment by michaelthwan_ai in [N] March 2023 - Recent Instruction/Chat-Based Models and their parents by michaelthwan_ai

You're definitely missing the entire T5 (encoder-decoder) family of models. From the UL2 paper , it seems encoder-decoder models are more powerful than decoder-only models (such as the GPT family), especially if you're most interested in inference latency.

I do very much wonder if OpenAI has tested equally-sized T5 models, and if there's some secret reason they have found as to why we should stick with GPT models, or if they just are doubling down on "their" idea, even if it is slightly inferior. Or maybe there are newer papers I don't know about.

signed7 t1_jdqm8lt wrote on March 26, 2023 at 12:45 PM

I'm probably wrong but I think I read somewhere Google has a patent on encoder-decoder, thus everyone else uses decoder-only