Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/
jd_3d t1_je7xkwq wrote
Reply to [D] Training a 65b LLaMA model by Business-Lead2679
Enough VRAM is key. With all the tricks (lora, int8, bits and bytes) you'll need at least 120GB of VRAM. A full fine tune would take even more. I'd go with 4 or 8xA100 80GB machines since it won't necessarily be more expensive (training will be highly parallel). See here for more info: https://www.storminthecastle.com/posts/alpaca_13B/