vade
vade t1_j0gpows wrote
Reply to comment by Outrageous_Room_3167 in I have 6x3090 looking to build a rig by Outrageous_Room_3167
It’s fine - we haven’t had a huge issue due to it but we work on video related projects so memory is always a boon. That’s all!
vade t1_j0gp6ha wrote
Reply to comment by Outrageous_Room_3167 in I have 6x3090 looking to build a rig by Outrageous_Room_3167
We have a Ryzen 3950 max ram (128gb) - but I’d like to get a system that supports more ram and 16x on all pci slots. But alas - $$$
vade t1_j0dl85z wrote
I run 3x 3090 in a single case, without water cooling, but using one PCI riser and keeping the case open to allow for airflow. This is on a single 1600w PSU, no NVLink.
Anything more would be tough without a custom loop, and dual PSU.
Works great!
edit: I use a Fractal Design Design XL, and mount one 3090 FE vertically with a riser. Its janky but works.
vade t1_iumm84n wrote
So there’s a few ways to think about this and some things to know
A) Apple Neural Engine is designed for inference workloads and not back prop or training as far as I’m aware.
B) This means only GPU or CPU for training for DL
C) You can get partial GPU accceleration using pytorch and tensorflow but neither are fully optimized or really competitive.
D) you can accept the training wheels (pun intended) and train simple models using createML GUI which has about as good as you’ll get M series support for GPU but is woefully out of date for many classes of problems and doesn’t support arbitrary layers, losses,optimizers etc. it’s a black box.
E) You can use createML api to get a tad more control but not much more.
If you’re interested in coreML for inference I will say from experience model conversion is non trivial if you want performance as some layers don’t always convert appropriately and shapes can’t always be deduced depending on the models source code.
Also CoreML inference in python doesn’t properly support batching. I’m not joking.
All in all if you get simple shit working it’s fast, but if you want anything remotely nuanced or not out of the box you’re fucked unless you want to write custom metal re-implementations of things like NMS do you can get access to outputs Apples layers don’t supply.
Source: banging my head against fucking wall
vade t1_iqohymz wrote
Running training is going to throttle the GPU quickly due to heat buildup. It’s just a fact that these systems have limited airflow and can’t sustain super high clock speeds / usage under load - esp ml training loads which really put systems under duress.
I don’t know enough about either build but something to be aware of.
vade t1_j6xeylg wrote
Reply to [D] Apple's ane-transformers - experiences? by alkibijad
FWIW, a colleague of mine is working on this, and is also hitting some hiccups. Ive pointed them to this thread :)