Long_Two_6176
Long_Two_6176 t1_j7gc9rz wrote
Remember also that computations, not just parameter count, cost GPU memory. Check your intermediate tensor sizes
Long_Two_6176 t1_iu5omuu wrote
This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.
A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.
Long_Two_6176 t1_irc2dtw wrote
Reply to comment by Prinzessid in [D] How hard is it to join a lab during Master's? by Ok-Experience5604
You learn on the job. First year PhD students feel the same way
Long_Two_6176 t1_j8h28vw wrote
Reply to MacBook Air vs Pro by Fun-Cartographer8611
So today I found out that mps on PyTorch 13.1 (stable) has bugs causing a lack of learning. FashionMNIST accuracies bounced around in <10%. Switched to cpu and worked fine (>80%)