Submitted by tysam_and_co t3_10op6va in MachineLearning
unhealthySQ t1_j6g2h65 wrote
Reply to comment by tysam_and_co in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
So just to be sure I read things correctly, this project is about optimizing training speed for Transformer neural networks?
tysam_and_co OP t1_j6g3e49 wrote
Hello! Thanks so much for comment, I really appreciate it. This is a convnet-based architecture, so it's carrying on the torch of some of the old DawnBench entries.
Transformers have the best top-end of all of the neural networks, and convolutional networks tend to have an edge in the smaller/tiny regime, IIRC. One could maximize training speed for a transformer architecture, but the cost of just 1-2 layers could be several times the cost of an entire forward pass through this very tiny convnet. I even tried to just add a really tiny 16x16 attention multiply at the end of the network and it totally tanked the training speed.
However, that said, I'd really like to pick up the work of https://arxiv.org/abs/2212.14034 and continue from there, the concept of getting an algorithm to really compress that info can start opening up the horizon to some of the hard laws that underlie neural network training in the limit. For example, somewhere along the way now, apparently we have really strong consistency with scaling laws on the convnet for this project. I'm not sure why.
But in any case -- language models are hopefully next (if I get the time and have the interest/don't burn myself out on this project in the meantime!). I'll probably be focused on picking up some part-time research work in the field between here and then first, as that's my first priority right now (aside from a few community code contributions. This codebase is my living resume after all, and I think a good one at that! :D)
Hope that helped answer your question, and if not, please let me know and I'll give you my best shot! :D
unhealthySQ t1_j6g4zsz wrote
Thank you for the answer!
your work is highly impressive and I wish you continued success in your efforts; as I could see the work you do here having very appealing applications down the line.
tysam_and_co OP t1_j6g5fb8 wrote
Thank you very much, I appreciate your kind words. Good luck to you in all of your future endeavors as well! :D :) <3 <3 :))))
Viewing a single comment thread. View all comments