Long_Two_6176 t1_j8h28vw wrote on February 14, 2023 at 6:26 AM

Reply to MacBook Air vs Pro by Fun-Cartographer8611

So today I found out that mps on PyTorch 13.1 (stable) has bugs causing a lack of learning. FashionMNIST accuracies bounced around in <10%. Switched to cpu and worked fine (>80%)

Long_Two_6176 t1_j7gc9rz wrote on February 6, 2023 at 4:32 PM

Reply to Why does my Transformer blow GPU memory? by beautyofdeduction

Remember also that computations, not just parameter count, cost GPU memory. Check your intermediate tensor sizes

Long_Two_6176 t1_iu5omuu wrote on October 28, 2022 at 7:05 PM

Reply to Question about using more than one gpu for deeplearning tasks. by sabeansauce

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.

Long_Two_6176 t1_irc2dtw wrote on October 6, 2022 at 9:51 PM

Reply to comment by Prinzessid in [D] How hard is it to join a lab during Master's? by Ok-Experience5604

You learn on the job. First year PhD students feel the same way