Viewing a single comment thread. View all comments

farmingvillein t1_jdhua51 wrote

Hmm, what do you mean by "publicly"? OpenAI has publicly stated that GPT-4 is multi-modal, and that they simply haven't exposed the image API yet.

The image API isn't publicly available yet, but it is clearly coming.

9

loopuleasa t1_jdhuit0 wrote

talking about consumer access to the image API

is tricky, as the system is swamped already with text

they mentioned an image takes 30 seconds to "comprehend" by the model...

13

MysteryInc152 t1_jdj8x5e wrote

>they mentioned an image takes 30 seconds to "comprehend" by the model...

wait really ? Cn you link source or something. There's no reason a native implementation should take that long.

Now i'm wondering if they're just doing something like this -https://github.com/microsoft/MM-REACT

3

yashdes t1_jdij1tl wrote

these models are very sparse, meaning very few of the actual calculations actually effect the output. My guess is trimming the model is how they got gpt3.5-turbo and I wouldn't be surprised if gpt4-turbo is coming.

0

farmingvillein t1_jdj9w98 wrote

> these models are very sparse

Hmm, do you have any sources for this assertion?

It isn't entirely unreasonable, but 1) GPU speed-ups for sparsity aren't that high (unless OpenAI is doing something crazy secret/special...possible?), so this isn't actually that big of an upswing (unless we're including MoE?) and 2) openai hasn't released architecture details (beyond the original gpt3 paper--which did not indicate that the model was "very" sparse).

1

SatoshiNotMe t1_jdkd8l5 wrote

I’m curious about this as well. I see it’s multimodal but how do I use it with images? The ChatGPTplus interface clearly does not handle images. Does the API handle image?

1

farmingvillein t1_jdkdjye wrote

> I see it’s multimodal but how do I use it with images?

You unfortunately can't right now--the image handling is not publicly available, although supposedly the model is capable.

1