master3243 t1_jdzec5r wrote on March 28, 2023 at 9:06 AM

Reply to comment by hadaev in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Seeing how they made sure the bar exam and the math olympiad tests were recent ones that were explicitly stated to not be in the training dataset to avoid contamination, I trusted that all the other reported tests were also as carefully picked to avoid contamination.

MotionTwelveBeeSix t1_jdzurlg wrote on March 28, 2023 at 12:21 PM

The bar exams recycle the same questions every year, there’s very little original about them. Its a test of pure memorization