SeymourBits t1_jdlwrgi wrote
Reply to comment by itsnotlupus in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
This is the most accurate comment I've come across. The entire system is only as good and granular as the CLIP text description that's passed into GPT-4 which then has to "imagine" the described image, often with varying degrees of hallucinations. I've used it and can confirm it is currently not possible to operate anything close to a GUI with the current approach.
shitasspetfuckers t1_jed7vuu wrote
Can you please clarify what specifically you have tried, and what was the outcome?
Viewing a single comment thread. View all comments