__lawless t1_j76xpgk wrote on February 4, 2023 at 3:31 PM

Reply to comment by jaqws in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

Just 2 points a) They fine tuned this model to death. Where as GPT3.5 has a handful of examples to fine tune b) This is a multi modal model which consumes the image directly. Where as GPT can only consume text, so they fed it caption of the image

jaqws t1_j76zkhu wrote on February 4, 2023 at 3:44 PM

Ah, yeah I would agree that's not a fair comparison. Thanks for sharing.

kermunnist t1_j9kpp3s wrote on February 22, 2023 at 6:10 PM

I wonder how flamingo would compare