loopuleasa t1_jdhuit0 wrote
Reply to comment by farmingvillein in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
talking about consumer access to the image API
is tricky, as the system is swamped already with text
they mentioned an image takes 30 seconds to "comprehend" by the model...
MysteryInc152 t1_jdj8x5e wrote
>they mentioned an image takes 30 seconds to "comprehend" by the model...
wait really ? Cn you link source or something. There's no reason a native implementation should take that long.
Now i'm wondering if they're just doing something like this -https://github.com/microsoft/MM-REACT
yashdes t1_jdij1tl wrote
these models are very sparse, meaning very few of the actual calculations actually effect the output. My guess is trimming the model is how they got gpt3.5-turbo and I wouldn't be surprised if gpt4-turbo is coming.
farmingvillein t1_jdj9w98 wrote
> these models are very sparse
Hmm, do you have any sources for this assertion?
It isn't entirely unreasonable, but 1) GPU speed-ups for sparsity aren't that high (unless OpenAI is doing something crazy secret/special...possible?), so this isn't actually that big of an upswing (unless we're including MoE?) and 2) openai hasn't released architecture details (beyond the original gpt3 paper--which did not indicate that the model was "very" sparse).
Viewing a single comment thread. View all comments