N3urAlgorithm OP t1_j8dqtv4 wrote on February 13, 2023 at 3:42 PM

Reply to comment by CKtalon in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm

Thank you, the TMA is actually a big deal to speed up things but as far as i've found even if the 4x rtx has more vram it can't be used for memory pooling. But basically if i'm not wrong even with this limitation I can still distribute training along the 4 gpus but still for a maximum of 48gb.

CKtalon t1_j8e0j48 wrote on February 13, 2023 at 4:47 PM

You’ll have to use something like DeepSpeed to split the layers across multiple GPUs. Of course, if the model can fit on one GPU, then you can go to crazier with bigger batch sizes