FirstOrderCat t1_j9ja0i4 wrote on February 22, 2023 at 11:06 AM

multimodal here means questions contain pictures.

So, it is obvious gpt would underperform since it doesn't work with pictures, lol?..

EndTimer t1_j9kl706 wrote on February 22, 2023 at 5:43 PM

We would have to read the study methodology to evaluate how they were testing GPT 3.5's image context.

But in this case, multimodal refers to being trained on not just text (like GPT 3.5), but also images associated with that text.

That seems to have improved their model, which requires substantially fewer parameters while scoring higher, even in text-only domains.

kermunnist t1_j9kpbtz wrote on February 22, 2023 at 6:08 PM

That would make sense, having an understanding of images probably helps lead to a more intuitive grasp of physics

MayoMark t1_j9mp29t wrote on February 23, 2023 at 1:54 AM

https://commons.m.wikimedia.org/wiki/Category:Animations_of_acceleration#/media/File%3AJumper.gif

FirstOrderCat t1_j9krjxb wrote on February 22, 2023 at 6:21 PM

>which requires substantially fewer parameters while scoring higher, even in text-only domains.

Which tests in paper refer on text-only domains?

EndTimer t1_j9l1xxj wrote on February 22, 2023 at 7:24 PM

Presumably, TXT (text context). LAN (language sciences) are unlikely to have many images in their multiple choice questions. The other science domains and G1-12 probably have majority text questions.

FirstOrderCat t1_j9l4bho wrote on February 22, 2023 at 7:39 PM

What is IMG for GPT then there?

How come GPT performed better without seeing context compared to seeing text context?..

EndTimer t1_j9l4tc6 wrote on February 22, 2023 at 7:42 PM

I don't know. It's going to be in the methodology of the paper, which neither of us have read.

FirstOrderCat t1_j9l8xn9 wrote on February 22, 2023 at 8:07 PM

Yes, and then reproduce results from both papers, check the code to see nothing creative happens in datasets or during training, and there are much more claims in the academia than one has time to verify.

iamascii t1_j9o2v76 wrote on February 23, 2023 at 10:31 AM

They used the captions instead of the images. The captions are pretty descriptive imho.

FirstOrderCat t1_j9p55p9 wrote on February 23, 2023 at 4:07 PM

It still doesn't answer second question.

duffmanhb t1_j9jps75 wrote on February 22, 2023 at 1:42 PM

So basically it's fine tuned for this specific topic, where GPT is large because it's a general dataset for multidomain use.

[deleted] t1_j9jlyhh wrote on February 22, 2023 at 1:11 PM

[deleted]

What. The. ***k. [less than 1B parameter model outperforms GPT 3.5 in science multiple choice questions]

Destiny_Knight OP t1_j9iysi7 wrote on February 22, 2023 at 8:31 AM