gwern t1_jc42lxd wrote
Reply to comment by rolexpo in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
And yet, they get shit on for releasing it at all (never mind in a way they knew perfectly well would leak), while no one ever seems to remember all of the other models which didn't get released at all... And ironically, Google is over there releasing Flan-T5 under a FLOSS license & free to download, as it has regularly released the best T5 models, and no one notices it exists - you definitely won't find it burning up the HN or /r/ML front pages. Suffice it to say that the developer community has never been noted for its consistency or gratitude, so optimizing for that is a mug's game.
(I never fail to be boggled at complaints about 'AI safety fearmongering is why we had to wait all these years instead of OA just releasing GPT-3', where the person completely ignores the half-a-dozen other GPT-3-scale models which are still unreleased, like most models were unreleased, for reasons typically not including safety.)
extopico t1_jc5revh wrote
Flan-t5 is good and flan-t5-xl runs well on 3060 in 8 bit mode. It’s not meant to be a chatbot however so that’s why it does not stir up so much excitement. T5 is best used for tasks and training it to handle specific domains. This makes it far more interesting to me than LLaMa which cannot be trained (yet) by us randoms.
generatorman_ai t1_jc5vsbw wrote
T5 is below the zero-shot phase transition crossed by GPT-3 175B (and presumably by LLaMA 7B). Modern models with instruction and HF finetuning will not need further task-specific finetuning for most purposes.
Viewing a single comment thread. View all comments