Thank you, the TMA is actually a big deal to speed up things but as far as i've found even if the 4x rtx has more vram it can't be used for memory pooling. But basically if i'm not wrong even with this limitation I can still distribute training along the 4 gpus but still for a maximum of 48gb.
N3urAlgorithm OP t1_j8drfq9 wrote
Reply to comment by lambda_matt in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm
you said that nvidia has killed of the dgx workstation but as I can see from here there's still something for h100?