MysteryInc152 t1_jdj8x5e wrote
Reply to comment by loopuleasa in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
>they mentioned an image takes 30 seconds to "comprehend" by the model...
wait really ? Cn you link source or something. There's no reason a native implementation should take that long.
Now i'm wondering if they're just doing something like this -https://github.com/microsoft/MM-REACT
Viewing a single comment thread. View all comments