neuroguy123
neuroguy123 t1_ispbbs0 wrote
Reply to comment by i-heart-turtles in [D] Now that Colab has introduced "compute units". Which are the best free/cheap alternatives? by zuccoff
I have been a long time user of Colab Pro+ and that might be true, but I have noticed a significant difference in the quality of GPU I get now. You can specify a 'premium' GPU now vs standard. I can get an A100-40Gb any time I want right now, whereas before that was rare. For 100 credits you can train for about 6.5 hours on that. Anyway, I think you're right that they just give you more transparency, but you also get more choice in how aggressively you want to spend the credits. I suspect that the A100s are more available now after these changes.
When I want to use Colab, what I do now is develop on my desktop machine until I have everything working properly and then just train as needed on Colab when I need more compute.
neuroguy123 t1_isjnunn wrote
Reply to comment by Nyanraltotlapun in [D] Simple Questions Thread by AutoModerator
I think you're overthinking it. Maybe try encoding your data as the vector change from one point to the next if you want to help the network learn about relative changes. See https://arxiv.org/pdf/1308.0850.pdf
neuroguy123 t1_irtuo1b wrote
I recommend some of the YOLO versions. I had fun with those and learned a lot about complex loss functions.
I also implemented a bunch of attention-models starting with Graves', through Bahdanau and Luong, and then Transformers. The history of attention in deep learning is very interesting and instructional to implement.
Another one I had fun implementing was Wavenet as it really forces you to deep dive on convolution variations, pixel-cnn, and some gated network structures. Then conditioning it was an extra challenge (similar to the attention networks).
One thing I've been meaning to get into is deepcut and other pose models because I don't know much about linear programming and the other math they use in those.
neuroguy123 t1_ix5gflf wrote
Reply to [R] Tips on training Transformers by parabellum630
I agree with /u/erannare that data size is likely the most relevant issue. I have done a lot of training on time-series data with Transformers and they can be quite difficult to train from scratch on medium-sized datasets. This is even outlined in the main paper. Most people simply do not have enough data in new problem spaces to properly take advantage of Transformer models, despite their allure. My suggestions:
All of that being said, test vs just more specialized architectures if you have long time-series data. It will take a lot of data and a lot of compute to fully take advantage of a Transformer on its own as an end-to-end architecture. RNNs and Wavenets are still relevant architectures as well whether your network is autoregressive, a classifier, or both.