SgathTriallair t1_jeerghs wrote
Reply to comment by Relevant_Ad7319 in Language Models can Solve Computer Tasks (by recursively criticizing and improving its output) by rationalkat
The task paper addressed this. If it can see the screen then in hasn't cases a keyboard and mouse API will be the best option.
How it knows where to click on the screen is that it is trained to understand images just like it understands text. So it will know that a trash can means you want to delete data the same way we know that.
Viewing a single comment thread. View all comments