lambda_matt t1_j8oozo1 wrote on February 15, 2023 at 9:18 PM

Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm

Now has rtx Ada 6k

lambda_matt t1_j8facir wrote on February 13, 2023 at 9:50 PM

Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm

Short answer is, it’s complicated. Some workloads can handle being distributed across slower memory busses.

Frameworks have also implemented strategies for doing single node distributed training https://pytorch.org/tutorials/beginner/dist_overview.html

lambda_matt t1_j8estga wrote on February 13, 2023 at 7:57 PM

Reply to comment by N3urAlgorithm in GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm

That’s a server. The DGX station was a downclocked v/a100 based workstation

https://images.nvidia.com/aem-dam/Solutions/Data-Center/nvidia-dgx-station-a100-infographic.pdf

lambda_matt t1_j8db78v wrote on February 13, 2023 at 1:46 PM

Reply to GPU comparisons: RTX 6000 ADA vs Hopper h100 by N3urAlgorithm

No more NVLink on the-cards-formerly-known-as-quadro, so if your models are VRAM hungry you may be constrained by the ada6ks. PCIe 5 and Genoa/Sapphire Rapids might even this out, but I am not on the product development side of things and am not fully up to speed on next-gen and there have been lots of delays on the cpu/motherboards.

Also, the TDPs for pretty much all of the Ada cards are massive and will make multi-gpu configurations difficult and likely limited to 2x.

NVIDIA has killed off the the dgx workstation so they are pretty committed to keeping the H100s a server platform.

There still isn’t much real world info, as there are very few of any of these cards in the wild.

Here are some benchmarks for the H100 at least https://lambdalabs.com/gpu-benchmarks And are useful for comparing to to Ampere-gen.

Disclaimer: I work for Lambda