If a task requires a certain amount of available memory on a gpu, and there are two gpus who cannot individually run the task, will the memory of each be combined if ran together? Does it work like that? Or does each gpu have to be capable memory-wise on its own?

Comments

You must log in or register to comment.

dafoshiznit t1_iu3oznc wrote on October 28, 2022 at 9:42 AM

#279,710

I have no idea how I got here

sabeansauce OP t1_iu3s2p9 wrote on October 28, 2022 at 10:24 AM

#280,145

Replying to dafoshiznit (#279,710)

glad to have you still

nutpeabutter t1_iu3v2bd wrote on October 28, 2022 at 11:00 AM

#280,589

There is currently no easy way of pooling vram. If the model can't fit onto vram I suggest you check out https://huggingface.co/transformers/v4.9.2/parallelism.html#tensor-parallelism.

FuB4R32 t1_iu41a65 wrote on October 28, 2022 at 12:05 PM

#281,655

Is this for training or inference? The easiest thing to do is to split up the batch size between multiple GPUs. If you can't even fit batch=1 on a single GPU though then model parallelism is generally a harder problem

sabeansauce OP t1_iu45f4w wrote on October 28, 2022 at 12:42 PM

#282,457

Replying to FuB4R32 (#281,655)

for training. Essentially I have to choose between one powerful gpu or multiple average ones. But I know that the average ones on their own don't have enough space (because i have one) for the task at hand. I prefer the one gpu but company is asking if a multi-gpu setup of lesser capabilities will also work if used together.

sabeansauce OP t1_iu45vee wrote on October 28, 2022 at 12:46 PM

#282,550

Replying to nutpeabutter (#280,589)

that is a good intro on the topic I bookmarked the paper they referenced. Good to know I have this in the toolbox thank you

FuB4R32 t1_iu45yrl wrote on October 28, 2022 at 12:47 PM

#282,573

Replying to sabeansauce (#282,457)

Yeah I think I understand, e.g. Google cloud has a great deal on K80 especially if you commit to the costs up front. If you have even a handful of mid GPUs it should be faster training anyway since you can achieve a large batch size, but it depends on the details ofc

dafoshiznit t1_iu4lg1t wrote on October 28, 2022 at 2:43 PM

#285,836

Replying to sabeansauce (#280,145)

Thank you sir. I'm going to embark on an adventure to learn everything I can about deep learning to answer your question.

Melodic-Scallion-416 t1_iu4s1r8 wrote on October 28, 2022 at 3:28 PM

#287,289

I have been reading articles about OpenAI Triton and how that helps to optimize memory usage during GPU processing. I have not used it personally but was planning to try it out, to address this same concern.

Long_Two_6176 t1_iu5omuu wrote on October 28, 2022 at 7:05 PM

#293,516

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.