Submitted by Business-Lead2679 t3_12618zu in MachineLearning
learn-deeply t1_je9eovt wrote
Reply to comment by ustainbolt in [D] Training a 65b LLaMA model by Business-Lead2679
Tensor (aka model parallel) parallel with model checkpointing works better than FSDP (though they can be used in conjunction) from my experience. FSDP is easier to work with though.
[deleted] t1_je9rb1f wrote
[deleted]
Viewing a single comment thread. View all comments