Submitted by tysam_and_co t3_10op6va in MachineLearning
tysam_and_co OP t1_j6o72ma wrote
Reply to comment by tysam_and_co in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Okay, I ran some other experiments and I'm starting to get giddy (you're the first 'ta know! :D). It appears that for most hyperparameters, twiddling on CIFAR100 is just a flat response, or a slight downward trend (!!!) I haven't messed with them all yet, though, but that bodes very, very well (!!!!).
Also, doing the classical range boost of changing from depth 64->128 and num_epochs 10->80 results in a boost to about 80% in 3 minutes of training or so, which is about where CIFAR100 was in early 2016 or so. It's harder for CIFAR10 as I think that was slightly more popular and there was a monstrous jump, then a long flat area during that period, but if you do some linear/extremely coarse piecewise interpolation from the average starting point of CIFAR10 to the current day of CIFAR10 as far as accuracy goes on PapersWithCode, and do the same roughly for CIFAR100, adding this extra capacity+training time moves them both from ~2015 SOTA numbers to ~early 2016 SOTA numbers. Wow!! That's incredible! This is starting to make me really giddy, good grief.
I'm curious if cutout or anything else will help, we'll see! There's definitely a much bigger train<->eval % gap here, but adding more regularization may not help as much as it would seem up front.
Viewing a single comment thread. View all comments