Submitted by Business-Lead2679 t3_12618zu in MachineLearning
I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated.
Business-Lead2679 OP t1_je70nka wrote
Id like to train it on those settings:
EPOCHS = 3
LEARNING_RATE = 2e-5
CUTOFF_LEN = 1024