loopuleasa t1_jdhuit0 wrote on March 24, 2023 at 2:36 PM

Reply to comment by farmingvillein in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

talking about consumer access to the image API

is tricky, as the system is swamped already with text

they mentioned an image takes 30 seconds to "comprehend" by the model...

MysteryInc152 t1_jdj8x5e wrote on March 24, 2023 at 7:59 PM

>they mentioned an image takes 30 seconds to "comprehend" by the model...

wait really ? Cn you link source or something. There's no reason a native implementation should take that long.

Now i'm wondering if they're just doing something like this -https://github.com/microsoft/MM-REACT

yashdes t1_jdij1tl wrote on March 24, 2023 at 5:12 PM

these models are very sparse, meaning very few of the actual calculations actually effect the output. My guess is trimming the model is how they got gpt3.5-turbo and I wouldn't be surprised if gpt4-turbo is coming.

farmingvillein t1_jdj9w98 wrote on March 24, 2023 at 8:05 PM

> these models are very sparse

Hmm, do you have any sources for this assertion?

It isn't entirely unreasonable, but 1) GPU speed-ups for sparsity aren't that high (unless OpenAI is doing something crazy secret/special...possible?), so this isn't actually that big of an upswing (unless we're including MoE?) and 2) openai hasn't released architecture details (beyond the original gpt3 paper--which did not indicate that the model was "very" sparse).