tysam_and_co OP t1_j6o25u4 wrote on January 31, 2023 at 6:18 PM

Reply to comment by arhetorical in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Oh, I see! Yeah, I probably will want to leave process spawning/forking stuff to the side as that can require some bug-resistant refactoring IIRC. However! I believe that would only require some change around the dataloaders and maybe some stuff at the beginning of the file. I am unfortunately terribly rusty on this, but you might be able to get away with changing the num_dataloaders=2 -> num_dataloaders->0 in your file, and I believe that would run 'much' more slowly the first time, then the same after, without any forking issues?

As for CIFAR100, I made the absolute minimum number of changes/additions to it, which was 3 characters. I added 0s to each of the two main dataloaders, and then one 0 to the num_classes parameter. On the first run with this, I'm averaging about 75.49% validation accuracy, which matches roughly what the 2015 SOTA was for CIFAR100. The 2015 SOTA for CIFAR10 was roughly 94%, so I believe that we are in very good hands here! This bodes quite well, I think, but I am unsure. This also was the first blind run (well, I had to do it again on the right notebook base as I accidentally pulled an older version that was about ~.8% below this one -- and in over 10 seconds! Interestingly to me, we're still running at right about ~9.91-9.94 seconds or so, I would have thought the extra 90 classes would have added some appreciable overhead to this! Creazy! :D That opens a lot of cool avenues (Imagenet?!?!) that I've been sorta hardcore ignoring as a result. Goes to show I guess that there's basically no replacement for really good testing! :D :)))) ), no other tuning or anything. I wouldn't be surprised if one could get more performance with more tuning -- though it would be surprising if we were simply at a local maxima already! Either way, I find it somewhat validating.

Thank you for being the first person to comment on and support my work. You really made my day back then, and as of yesterday the project was being tweeted by Karpathy. I am appreciative at about the same level to both of you for your support and kindness -- much love! <3 :)))) <3 :D

tysam_and_co OP t1_j6o72ma wrote on January 31, 2023 at 6:48 PM

Okay, I ran some other experiments and I'm starting to get giddy (you're the first 'ta know! :D). It appears that for most hyperparameters, twiddling on CIFAR100 is just a flat response, or a slight downward trend (!!!) I haven't messed with them all yet, though, but that bodes very, very well (!!!!).

Also, doing the classical range boost of changing from depth 64->128 and num_epochs 10->80 results in a boost to about 80% in 3 minutes of training or so, which is about where CIFAR100 was in early 2016 or so. It's harder for CIFAR10 as I think that was slightly more popular and there was a monstrous jump, then a long flat area during that period, but if you do some linear/extremely coarse piecewise interpolation from the average starting point of CIFAR10 to the current day of CIFAR10 as far as accuracy goes on PapersWithCode, and do the same roughly for CIFAR100, adding this extra capacity+training time moves them both from ~2015 SOTA numbers to ~early 2016 SOTA numbers. Wow!! That's incredible! This is starting to make me really giddy, good grief.

I'm curious if cutout or anything else will help, we'll see! There's definitely a much bigger train<->eval % gap here, but adding more regularization may not help as much as it would seem up front.