Viewing a single comment thread. View all comments

__lawless t1_j76xpgk wrote

Just 2 points a) They fine tuned this model to death. Where as GPT3.5 has a handful of examples to fine tune b) This is a multi modal model which consumes the image directly. Where as GPT can only consume text, so they fed it caption of the image

26

jaqws t1_j76zkhu wrote

Ah, yeah I would agree that's not a fair comparison. Thanks for sharing.

4