Submitted by N3urAlgorithm t3_1115h5o in deeplearning
N3urAlgorithm OP t1_j8dqtv4 wrote
Reply to comment by CKtalon in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
Thank you, the TMA is actually a big deal to speed up things but as far as i've found even if the 4x rtx has more vram it can't be used for memory pooling. But basically if i'm not wrong even with this limitation I can still distribute training along the 4 gpus but still for a maximum of 48gb.
CKtalon t1_j8e0j48 wrote
You’ll have to use something like DeepSpeed to split the layers across multiple GPUs. Of course, if the model can fit on one GPU, then you can go to crazier with bigger batch sizes
Viewing a single comment thread. View all comments