Submitted by Balance- t3_124eyso in MachineLearning
master3243 t1_jdzec5r wrote
Reply to comment by hadaev in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Seeing how they made sure the bar exam and the math olympiad tests were recent ones that were explicitly stated to not be in the training dataset to avoid contamination, I trusted that all the other reported tests were also as carefully picked to avoid contamination.
MotionTwelveBeeSix t1_jdzurlg wrote
The bar exams recycle the same questions every year, there’s very little original about them. Its a test of pure memorization
Viewing a single comment thread. View all comments