blose1 t1_j4td3lq wrote on January 18, 2023 at 2:52 AM

Reply to comment by mrconter1 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1

>Recognize the Gmail icon of I say "send an email"

This is not testing intelligence, this is testing if human was trained on computer usage, knows what e-mail is and used gmail before.

Someone from tribe in Africa would fail your test while he is human and is intelligent, train him on this task like you would train current gen multimodal system and it will pass your benchmark. You train LLM in combination with image model and RL model, train on instruction following using inputs you described and now it understands what it sees, can follow what you want it to do.

mrconter1 OP t1_j4tuaal wrote on January 18, 2023 at 5:14 AM

> This is not testing intelligence, this is testing if human was trained on computer usage, knows what e-mail is and used gmail before.

I don't think it's binary. I think intelligence is a large part here.

> Someone from tribe in Africa would fail your test while he is human and is intelligent,

Could you train a bird to pass all questions on this benchmark? No. Because it's not as intelligent as a human.

> train him on this task like you would train current gen multimodal system and it will pass your benchmark. You train LLM in combination with image model and RL model, train on instruction following using inputs you described and now it understands what it sees, can follow what you want it to do.

Solving this benchmark is an easy problem? How long do you think it will take until we have a model that can causually solve all the instructions a gave in the previous comment?