Viewing a single comment thread. View all comments

YonatanBitton OP t1_iu02pl6 wrote

This is a great point, thank you. The interpretation of common sense tasks varies from person to person, and common sense reasoning involves some ambiguity. WinoGAViL, however, only uses instances which were solved well by three human solvers (over 80% Jaccard index). To validate our dataset, we took other players (who did not take part in the data generation task) and verified that it was solved with high human accuracy (90%).

4

shahaff32 t1_iu0mobx wrote

Thank you for your answer, we will look into it :)

1