__lawless t1_j76xpgk wrote
Reply to comment by jaqws in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Just 2 points a) They fine tuned this model to death. Where as GPT3.5 has a handful of examples to fine tune b) This is a multi modal model which consumes the image directly. Where as GPT can only consume text, so they fed it caption of the image
jaqws t1_j76zkhu wrote
Ah, yeah I would agree that's not a fair comparison. Thanks for sharing.
kermunnist t1_j9kpp3s wrote
I wonder how flamingo would compare
Viewing a single comment thread. View all comments