Submitted by ahiddenmessi2 t3_11dzfvf in MachineLearning
[removed]
Submitted by ahiddenmessi2 t3_11dzfvf in MachineLearning
[removed]
Thank you . I will take a look of my number of parameters .
I can recommend a free open-source lib to help you train on cloud if you need more resources https://skypilot.readthedocs.io/en/latest/
Thank you I will look into it
Knowing the architecture isn't enough. How large is your training dataset? Do you use gradient accumulation?
My dataset size can be varied cuz the data can be generated. Also, I will consider using gradient accumulation to improve performance too. Thanks
2060 has 6GB of VRAM, right?
It should be possible to train with that amount https://huggingface.co/docs/transformers/perf_train_gpu_one#optimizer
If you need to train from scratch (most people will just finetune) this will take a while, original training took 90 hours in 8xV100, each one should be faster than your GPU https://www.arxiv-vanity.com/papers/1910.01108/
Thanks for your reply. My goal is to train the transformer to read a specific programming language so I I guess there is no pre trained model available. Seems I have to train it from scratch on my laptop GPU :(
Edit: and yes it has 6gb only
For reference a RTX 3090 can be rented as low as ~ $0.25/hour at vast.ai with just a credit card if you are in a hurry (AWS and GCP require a quota increase to use GPUs), or you may be able to get free credits for research at major cloud providers.
ChatGPT uses GPT-3.5, which is a pre-trained model. Google uses pertained models. Facebook created a pre-trained model recently.
If these models satisfy their needs it will definitely satisfy yours. Unless if you are going beyond a kind of problem that hasn't been tackled before, a pre-trained model will save you so much time training and require a lot less data to get it to converge and actually be useful.
Thank you. I am looking at codeBERT which might satisfy my needs
CKtalon t1_jabpgds wrote
A ~10^7-10^8 parameter model should be possible.