Submitted by Maxerature t3_10ir1su in MachineLearning
ArnoF7 t1_j5lknua wrote
Reply to comment by FastestLearner in [D] Multiple Different GPUs? by Maxerature
I have the similar suspicion as well, that the training will be bottlenecked by the slow 1080. But I am wondering if it’s possible to treat 1080 as a pure VRAM extension?
Although it’s possible that the time spent on transferring between different memories makes the gain of having more VRAM pointless
FastestLearner t1_j5mwz47 wrote
It is possible, but it would require you to write custom code for every memcopy operation that you want to perform i.e. tensor.to(device)
, which you can get away with on a smaller project but could become prohibitively cumbersome on a large project. Also you'd still need to do two forward passes (one with the data on the 3080 itself, and then another with the data on the 1080 after having it transferred to the 3080). Whether or not this is beneficial boils down to differences in transfer rates between the RAM-3080 route and the RAM-1080-3080 route. I won't be able to tell which one is faster without benchmarking.
DeepSpeed handles the RAM-3080 to-and-fro transfers for large batch sizes automatically.
ArnoF7 t1_j5omrfh wrote
Great insight. Appreciate it
Viewing a single comment thread. View all comments