BlazeObsidian t1_iujbbdu wrote on October 31, 2022 at 7:29 PM

Reply to comment by alexnasla in [D] When the GPU is NOT the bottleneck...? by alexnasla

Are you sure you model is running on the GPU ? See https://towardsdatascience.com/pytorch-switching-to-the-gpu-a7c0b21e8a99 or if you can see GPU utilisation it might be simpler to verify.

If you are not explicitly moving your model to the GPU I think it's running on the CPU. Also how long is it taking ? Do you have a specific time that you compared the performance with ?

alexnasla OP t1_iujbukx wrote on October 31, 2022 at 7:33 PM

Im pretty sure its running on the GPU. I dont remember what the GPU utilization was though, ill take a look when I get a chance.

The test that I mentioned ran for 8 hours.

K-o-s-l-s t1_iujldkh wrote on October 31, 2022 at 8:37 PM

What are you using to log and monitor your jobs? Knowing CPU, RAM, and GPU utilisation will make this a lot easier to understand.

I agree with the poster above; no appreciable speed up switching between a k80 and an a100 makes me suspect that the GPU is not being utilised at all.

alexnasla OP t1_iujn3mm wrote on October 31, 2022 at 8:49 PM

Ok so what I did was actual max out the input buffers to the most the GPU can handle without crashing. So basically fully saturating the VRAM.