regalalgorithm t1_je1eu1e wrote on March 28, 2023 at 6:40 PM

FYI, the GPT 4 paper has a whole section on contamination in the appendix - I found it to be pretty convince. Removing contaminatimg data did make it worse at some benchmarks, but also better at others, and overall it wasn't a huge effect.

StellaAthena t1_je3tz04 wrote on March 29, 2023 at 5:28 AM

I found this analysis incredibly unconvincing. They used a weaker standard for deduplication than is standard as well as a weaker analysis than the one they did for the GPT-3 paper.