Viewing a single comment thread. View all comments

enryu42 OP t1_jdrbyh5 wrote

Arithmetic can be solved in a toolformer-like way, by just giving it an access to a calculator. But this wouldn't help with coding.

Regarding the point about boilerplate, this is exactly what is surprising: GPT4 performs very well on exams/tests, which supposedly require some amount of creative reasoning. So either the tests are poorly designed, or it can do some creative tasks while not others. If the latter is the case, it would be interesting to learn which are the areas where it performs well, and why.

21

liqui_date_me t1_jdrd9dx wrote

One could argue that even standardized tests are somewhat boilerplate - if you practice enough SAT tests you’ll eventually do quite well at them, the questions are quite similar to each other from exam to exam. Ditto for AP exams.

I think a serious test for GPT4’s intelligence will be on one of the competitive entrance exams for some countries, like the IIT-JEE or the Gaokao or the International Math Olympiad, where the questions are made by domain experts and are designed to be intentionally difficult and specialized to solve.

19

enryu42 OP t1_jdrezba wrote

I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.

To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).

16

blose1 t1_jdskab0 wrote

These models have access to all human knowledge, all scientific papers, books etc. If I would have such a knowledge I could solve any Olympiad tasks.

3

visarga t1_jdtxxfd wrote

You're mistaken, Olympiad problems require bespoke tricks that don't generalise from problem to problem. It's not a problem of breadth of knowledge, they don't test memorisation.

6

blose1 t1_jdu4cln wrote

What? Where I'm exactly mistaken? Because both of my statements are true. And there is 0% chance you can pass olympiad task without knowledge, human with all the knowledge WILL reason and come up with a solution BASED on the knowledge he has AND experience of others that is part of that knowledge, if that weren't true then no human would solve any Olympiad. Sorry, but what you wrote in context of my comment is just ridiculous, and looks like a reply to something I didn't write.

4

currentscurrents t1_jdrt3gv wrote

I think all tests designed for humans are worthless here.

They're all meant to compare humans against each other, so they assume you don't have the ability to read and remember the entire internet. You can make up for a lack of reasoning with an abundance of data. We need synthetic tests designed specifically for LLMs.

11

Yecuken t1_jdsm4w1 wrote

Tests would not help against optimization, models will just learn how to pass the test. Optimization will always win against any problem with a known solution

2

maxToTheJ t1_jdrx281 wrote

> which supposedly require some amount of creative reasoning.

The dont which is exactly has been part of the complaints of teachers in regards to standardized testing

3