Comments

You must log in or register to comment.

FuB4R32 t1_iu41a65 wrote

Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem

7

sabeansauce OP t1_iu45f4w wrote

for training. Essentially I have to choose between one powerful gpu or multiple average ones. But I know that the average ones on their own don't have enough space (because i have one) for the task at hand. I prefer the one gpu but company is asking if a multi-gpu setup of lesser capabilities will also work if used together.

3

FuB4R32 t1_iu45yrl wrote

Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc

3

Melodic-Scallion-416 t1_iu4s1r8 wrote

I have been reading articles about OpenAI Triton and how that helps to optimize memory usage during GPU processing. I have not used it personally but was planning to try it out, to address this same concern.

1

Long_Two_6176 t1_iu5omuu wrote

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.

2