liqui_date_me t1_jdrd9dx wrote on March 26, 2023 at 4:19 PM

One could argue that even standardized tests are somewhat boilerplate - if you practice enough SAT tests you’ll eventually do quite well at them, the questions are quite similar to each other from exam to exam. Ditto for AP exams.

I think a serious test for GPT4’s intelligence will be on one of the competitive entrance exams for some countries, like the IIT-JEE or the Gaokao or the International Math Olympiad, where the questions are made by domain experts and are designed to be intentionally difficult and specialized to solve.

enryu42 OP t1_jdrezba wrote on March 26, 2023 at 4:31 PM

I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.

To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).

blose1 t1_jdskab0 wrote on March 26, 2023 at 9:23 PM

These models have access to all human knowledge, all scientific papers, books etc. If I would have such a knowledge I could solve any Olympiad tasks.

visarga t1_jdtxxfd wrote on March 27, 2023 at 4:07 AM

You're mistaken, Olympiad problems require bespoke tricks that don't generalise from problem to problem. It's not a problem of breadth of knowledge, they don't test memorisation.

blose1 t1_jdu4cln wrote on March 27, 2023 at 5:17 AM

What? Where I'm exactly mistaken? Because both of my statements are true. And there is 0% chance you can pass olympiad task without knowledge, human with all the knowledge WILL reason and come up with a solution BASED on the knowledge he has AND experience of others that is part of that knowledge, if that weren't true then no human would solve any Olympiad. Sorry, but what you wrote in context of my comment is just ridiculous, and looks like a reply to something I didn't write.

[deleted] t1_jdrxdrd wrote on March 26, 2023 at 6:41 PM

[deleted]

currentscurrents t1_jdrt3gv wrote on March 26, 2023 at 6:11 PM

I think all tests designed for humans are worthless here.

They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.

Yecuken t1_jdsm4w1 wrote on March 26, 2023 at 9:36 PM

Tests would not help against optimization, models will just learn how to pass the test. Optimization will always win against any problem with a known solution

maxToTheJ t1_jdrx281 wrote on March 26, 2023 at 6:39 PM

> which supposedly require some amount of creative reasoning.

The dont which is exactly has been part of the complaints of teachers in regards to standardized testing

[D] GPT4 and coding problems

enryu42 OP t1_jdrbyh5 wrote on March 26, 2023 at 4:10 PM