Viewing a single comment thread. View all comments

N3urAlgorithm OP t1_j8dqtv4 wrote

Thank you, the TMA is actually a big deal to speed up things but as far as i've found even if the 4x rtx has more vram it can't be used for memory pooling. But basically if i'm not wrong even with this limitation I can still distribute training along the 4 gpus but still for a maximum of 48gb.

1

CKtalon t1_j8e0j48 wrote

You’ll have to use something like DeepSpeed to split the layers across multiple GPUs. Of course, if the model can fit on one GPU, then you can go to crazier with bigger batch sizes

1