Southern-Trip-1102 t1_itwt1ac wrote on October 26, 2022 at 9:43 PM

You might want to look into the BLOOM models on huggingface.

AuspiciousApple OP t1_itwvawz wrote on October 26, 2022 at 9:58 PM

Thanks, I'll take a look. Have you played around with them yourself?

Southern-Trip-1102 t1_itwyp8o wrote on October 26, 2022 at 10:22 PM

A bit, as far as I can tell they (the 176B one) are on par with gpt 3. Though I haven't done much testing or comparison. They are also trained on 13 programming and 59 languages from what i read.

AuspiciousApple OP t1_itx00gv wrote on October 26, 2022 at 10:32 PM

Thanks! Even a qualitative subjective judgement of rough parity is quite encouraging. I might need deepspeed/etc. to get it to run on my 8GB GPU, but if it's even similar quality, that's very cool.

visarga t1_itwxzgs wrote on October 26, 2022 at 10:17 PM

My experience is that models that have not had the instruction tuning treatment don't behave nice.

Southern-Trip-1102 t1_itwyur3 wrote on October 26, 2022 at 10:23 PM

Could that be because of Bloom being trained on a more varied datasets as opposed to being focused on English, as it was trained on multiple languages and programming langs?