SeymourBits t1_jdlwrgi wrote on March 25, 2023 at 11:09 AM

Reply to comment by itsnotlupus in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

This is the most accurate comment I've come across. The entire system is only as good and granular as the CLIP text description that's passed into GPT-4 which then has to "imagine" the described image, often with varying degrees of hallucinations. I've used it and can confirm it is currently not possible to operate anything close to a GUI with the current approach.

shitasspetfuckers t1_jed7vuu wrote on March 31, 2023 at 3:58 AM

Can you please clarify what specifically you have tried, and what was the outcome?