I apologize if what I'm about to say sounds trivial, but I recently trained the 7b version of llama on my json dataset containing 122k questions and answers. The results were quite good, but I noticed that about 30% of the answers could be improved. I've heard that the 65b model is significantly better, so I'm interested in training it to see how it performs. I already tried Google Colab (high-ram), Paperspace, Deepnote, and Jetbrains, and all crashed. I'm wondering how I can realistically train the 65b model with my $1k budget and complete the training process without any major issues? Any advice is appreciated.

Comments

Business-Lead2679 OP t1_je70nka wrote on March 29, 2023 at 9:31 PM

#2,462,062

Id like to train it on those settings:

EPOCHS = 3

LEARNING_RATE = 2e-5

CUTOFF_LEN = 1024

WarProfessional3278 t1_je790g7 wrote on March 29, 2023 at 10:30 PM

#2,463,932

By training do you mean finetuning with lora or from the ground up like alpaca? Realistically you could just rent an 8xa100 and spend 4 or 5 hours to get it done

Business-Lead2679 OP t1_je794o8 wrote on March 29, 2023 at 10:31 PM

#2,463,965

Replying to WarProfessional3278 (#2,463,932)

I tried vast.ai which didn’t work. I’m a newbie so maybe I’m doing something wrong

Business-Lead2679 OP t1_je7aefg wrote on March 29, 2023 at 10:40 PM

#2,464,208

Replying to WarProfessional3278 (#2,463,932)

Just like Alpaca. Even the JSON format is the same as the one released by Stanford, just with different inputs & outputs

gmork_13 t1_je7dho0 wrote on March 29, 2023 at 11:03 PM

#2,464,916

For a more stable compute, check out google cloud gpu.

Consider training a quantized model with LoRA. If you know enough, perhaps the model could be split between VRAM and DDR RAM to make it train on a smaller GPU.

edit: here, I found one: https://github.com/tloen/alpaca-lora

I think you could get this done for far less than your budget if need be.

[deleted] t1_je7hpr5 wrote on March 29, 2023 at 11:35 PM

#2,465,827

[removed]

ustainbolt t1_je7plqi wrote on March 30, 2023 at 12:34 AM

#2,467,469

For a 65b model you are probably going to have to parallelise the model parameters. See this link. As for training, it would be best to use a vm (any provider will work, lambda and vast.ai are cheap). I would a recommend 4x (or 8x) A100 machine. I'm sure you can find more information about all of this.

Justice43 t1_je7vbe7 wrote on March 30, 2023 at 1:17 AM

#2,468,715

I recommend looking into Lambda Cloud VMs. They're much cheaper than AWS, and their largest machine (8x A100, 80GB VRAM for each A100) should be enough to finetune the 65b LLaMA model.

dreaming_geometry t1_je7vmov wrote on March 30, 2023 at 1:19 AM

#2,468,791

Replying to Business-Lead2679 (#2,463,965)

If you're having trouble with Vast.ai, you can ask for help on the discord. Sounds like your desired use case is a good fit.

wrossmorrow t1_je7vy2p wrote on March 30, 2023 at 1:22 AM

#2,468,859

Replying to ustainbolt (#2,467,469)

+1 for lambda labs

jd_3d t1_je7xkwq wrote on March 30, 2023 at 1:34 AM

#2,469,214

Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/

ustainbolt t1_je7xtcw wrote on March 30, 2023 at 1:36 AM

#2,469,263

Replying to wrossmorrow (#2,468,859)

I love lambda. More reliable than vast.ai, and WAY cheaper than AWS/GCP/Azure.

brandonZappy t1_je7zwfg wrote on March 30, 2023 at 1:52 AM

#2,469,726

What QA dataset are you using?

OSeady t1_je81gws wrote on March 30, 2023 at 2:04 AM

#2,470,081

Contact Redmond.ai they can hook you up.

machineko t1_je88wj9 wrote on March 30, 2023 at 3:04 AM

#2,471,774

I'm working on an open source library focused on resource-efficient fine-tuning methods called xTuring: https://github.com/stochasticai/xturing

Here's how you would perform int8 LoRA fine-tuning in three lines:

python: https://github.com/stochasticai/xturing/blob/main/examples/llama/llama_lora_int8.py
colab notebook: https://colab.research.google.com/drive/1SQUXq1AMZPSLD4mk3A3swUIc6Y2dclme?usp=sharing

Of course the Colab still only works with smaller models. In the example above, 7B required 9G VRAM.

SigmaSixShooter t1_je8kncb wrote on March 30, 2023 at 4:56 AM

#2,474,369

I don’t have an answer for you, but as a fellow noobie, I’d love to hear how you did this. Any tips or resources you want to provide would be greatly appreciated.

Nhabls t1_je9598b wrote on March 30, 2023 at 9:35 AM

#2,478,195

Replying to ustainbolt (#2,469,263)

Every time I logged on to lambdalabs in the past year all their instances were full. Not that available in my experience

badabummbadabing t1_je9cdf7 wrote on March 30, 2023 at 11:07 AM

#2,479,608

Replying to Nhabls (#2,478,195)

They just had their Series B funding, they should upscale their resources soon.

learn-deeply t1_je9eovt wrote on March 30, 2023 at 11:33 AM

#2,480,105

Replying to ustainbolt (#2,467,469)

Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.

Business-Lead2679 OP t1_je9erdj wrote on March 30, 2023 at 11:33 AM

#2,480,117

Replying to Justice43 (#2,468,715)

Just checked it out - looks interesting. Unfortunately, the availability of this instance is quite limited, so I'm not sure if I can get access to it

[deleted] t1_je9rb1f wrote on March 30, 2023 at 1:25 PM

#2,483,192

Replying to learn-deeply (#2,480,105)

[deleted]

Evening_Ad6637 t1_jeapgrs wrote on March 30, 2023 at 5:20 PM

#2,492,205

Replying to machineko (#2,471,774)

That sounds very interesting. I'm sorry if this question is trivial or stupid, but I'm an absolute newcomer in this field. Is there a way to train the model as you describe it here (https://xturing.stochastic.ai/quickstart) with only or almost only CPU performance? It's about the fact that I have the following specifications i5 @3.5ghz, 16gb ddr4 ram and only a radeon pro 575 4gb graca. But since I saw how fast alpaca runs over my cpu and ram on my computer, I hope that I could also fine-tune a llama model with this equipment. I would be very grateful for more information regarding possibilities in this direction.

nmfisher t1_jeco3nx wrote on March 31, 2023 at 1:15 AM

#2,509,631

Replying to Business-Lead2679 (#2,480,117)

Someone also mentioned https://jarvislabs.ai/ to me the other day, haven't used it myself but it looks promising.

itsyourboiirow t1_jecqc1d wrote on March 31, 2023 at 1:32 AM

#2,510,294