Southern-Trip-1102 t1_itwt1ac wrote
You might want to look into the BLOOM models on huggingface.
AuspiciousApple OP t1_itwvawz wrote
Thanks, I'll take a look. Have you played around with them yourself?
Southern-Trip-1102 t1_itwyp8o wrote
A bit, as far as I can tell they (the 176B one) are on par with gpt 3. Though I haven't done much testing or comparison. They are also trained on 13 programming and 59 languages from what i read.
AuspiciousApple OP t1_itx00gv wrote
Thanks! Even a qualitative subjective judgement of rough parity is quite encouraging. I might need deepspeed/etc. to get it to run on my 8GB GPU, but if it's even similar quality, that's very cool.
visarga t1_itwxzgs wrote
My experience is that models that have not had the instruction tuning treatment don't behave nice.
Southern-Trip-1102 t1_itwyur3 wrote
Could that be because of Bloom being trained on a more varied datasets as opposed to being focused on English, as it was trained on multiple languages and programming langs?
Viewing a single comment thread. View all comments