v2thegreat
v2thegreat t1_j5s39fb wrote
It really depends on how often you think you'll train with the mode
If it's something that you'll do daily for at least 3 months, then I'd argue you can justify the 4090.
Otherwise, if this is a single model you want to play around with, then use an appropriate ec2 instance with gpus (remember: start with a small instance and then upgrade the instance as you need more compute, and remember to turn off your instance when you're not using it)
I don't really know what type of data you're playing around with (if it's image, text, or audio data for example), but you should be able to get pretty far without using a by doing small scale experiments and debugging, and then finally using a gpu for the final training
You can also use tensorflow datasets that have the ability to stream data from disk during training time, meaning that you won't need to store all of your files in memory during training, and be able to get away with a fairly decent computer.
Good luck!
v2thegreat t1_j2oablu wrote
Reply to comment by oilfee in [D] Simple Questions Thread by AutoModerator
For transformers that's likely a difficult question to answer without experimentation, but I always recommend to start small. It's generally hard enough to go from 0 to 1 without also worrying about scaling things up.
Currently, we're seeing that larger and larger models aren't really slowing down and continue to become more powerful.
I'd say that this deserves it's own post rather than a simple question.
Good luck and please respond when you end up solving it!
v2thegreat t1_j2o816v wrote
Reply to comment by oilfee in [D] Simple Questions Thread by AutoModerator
Well, to answer your original question: it depends on what problem you're trying to solve!
In theory yes you can work with a large corpus of data with a large language model, but as chatgpt showed us, it's not necessarily the case that a larger model will do better always, but rather that fine-tuning might give better results
I hope this helps!
v2thegreat t1_j2o5v0y wrote
Reply to comment by oilfee in [D] Simple Questions Thread by AutoModerator
It can, but I want to know why you want to use transformers in the first place. Having the entire context is important to avoid solving the wrong problem, especially one that might get expensive depending on what you're trying to do
v2thegreat t1_j2o58pj wrote
Reply to comment by oilfee in [D] Simple Questions Thread by AutoModerator
Why do you need a transformer model? What problem are you trying to solve?
v2thegreat t1_j2lpumb wrote
Reply to comment by i_likebrains in [D] Simple Questions Thread by AutoModerator
These comes under hyperparameter optimization, so you will definitely need to play around with them, but here are my rules of thumb (take it with a grain of salt!)
Learning rate: start with a large learning rate (ex 10e-3), and if the model overfits, then reduce it down to 10e-6. There's a stackoverflow article that explains this quite well.
Number of epochs: it's right before your model's loss starts diverging from the validation loss. Plot them out and where they diverge is where the overfitting happens.
Batch size: large enough that the data fits in memory to speed things up in general
v2thegreat t1_j2lp4cc wrote
Reply to comment by gmish27 in [D] Simple Questions Thread by AutoModerator
Hmm, seems like a simple image detection task, that might not even need much machine learning. Please post here if you end up figuring it out!
v2thegreat t1_j2lox9a wrote
Reply to comment by -s-u-n-n-y- in [D] Simple Questions Thread by AutoModerator
Off the top of my head, look into chatgpt, it's pretty good at this
v2thegreat t1_j2lojaq wrote
Reply to comment by ateeb_khan_13 in [D] Simple Questions Thread by AutoModerator
Deploy like in an app? Or on a website?
v2thegreat t1_j2j0xi3 wrote
Reply to comment by ateeb_khan_13 in [D] Simple Questions Thread by AutoModerator
Depends on what you mean by "deploy" and what the model is expected to do.
Where are you deploying? What does that model do? What problem are you trying to solve?
v2thegreat t1_j60fvmd wrote
Reply to comment by Zealousideal-Copy463 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Well, there are ec2 instances that are already setup. How often do you do this sort of thing? It might be justified to build your own home setup, but as someone who does that themselves, I can tell you it's kinda tedious and you end up being your own IT guy