Viewing a single comment thread. View all comments

SgathTriallair t1_jeerghs wrote

The task paper addressed this. If it can see the screen then in hasn't cases a keyboard and mouse API will be the best option.

How it knows where to click on the screen is that it is trained to understand images just like it understands text. So it will know that a trash can means you want to delete data the same way we know that.

2