RTX 4090 vs RTX 3090 Deep Learning Benchmarks

Some RTX 4090 Highlights:

24 GB memory, priced at $1599.
RTX 4090's Training throughput and Training throughput/$ are significantly higher than RTX 3090 across the deep learning models we tested, including use cases in vision, language, speech, and recommendation system.
RTX 4090's Training throughput/Watt is close to RTX 3090, despite its high 450W power consumption.
Multi-GPU training scales decently in our 2x GPU tests.

Comments

You must log in or register to comment.

Zer01123 t1_iv2lvf4 wrote on November 4, 2022 at 8:24 PM

training throughput/$ seems off, in my opinion, by taking the official prices instead of the street prices:

the 3090 is around 1.1k € in Germany/Europe
the 4090 is around 2.3k € in Germany/Europe

With those numbers, I don't think the 4090 can beat the 3090 in price to performance.

The 4090 would need have double the performance to the 3090 to make it worth it.

But interesting to see the performance of multiple gpus scaling

suflaj t1_iv2z6fz wrote on November 4, 2022 at 9:57 PM

Just a small correction: according to Geizhals RTX 4090 is 2030€, and 3090 is 1150€. So the 4090 would at this point need to be around 176% as powerful, but the prices of the 4090 will fall and of the 3090 will rise, so the comparison on the MSRP makes sense, since that is more stable than street prices.

bellyflop111 t1_iv40ead wrote on November 5, 2022 at 2:53 AM

Don’t forget it consumes more power so the performance increase needs to account for that

suflaj t1_iv5iar6 wrote on November 5, 2022 at 1:47 PM

That would be another metric, something like performance per watt per dollar, which is not included in the benchmark, and which is probably uninteresting to people, since then cards like 3060 would come out ahead.

zaphdingbatman t1_iv5ok40 wrote on November 5, 2022 at 2:37 PM

So long as MSRP continues to be more fantasy than reality, I am not interested in seeing it in comparisons. I will become interested only after it becomes reality again. It might be a while.

suflaj t1_iv5prfi wrote on November 5, 2022 at 2:46 PM

That's on you tbh

I don't think it's very scientific to judge properties of a card based on the whim of an unregulated agent. That way we could conclude that ancient cards, which are basically worthless, are the best.

But other than that I don't think there's a single person who would recommend anything other than a 3090 even before these benchmarks. Same situation as 1080Ti vs 2080Ti, the 3090 is just too good of a card.

killver t1_iv2t4ll wrote on November 4, 2022 at 9:13 PM

Yeah, not sure where they get this conclusion from.

nmkd t1_iv3miyj wrote on November 5, 2022 at 12:57 AM

4090 starts at 1950€ here in Germany

chatterbox272 t1_iv3tw2v wrote on November 5, 2022 at 1:58 AM

Street prices vary over time and location. For example, I have zero issues getting a 4090 at RRP where I am in Australia. Using RRP for comparisons makes the comparisons more universal and evergreen.

If you're considering a new GPU you'll need to know the state of your local market, so you can take the findings from Lambda, apply some knowledge about your local market (e.g. 4090s are actually 1.5x RRP right now where you are, or whatever), and then you can redraw your own conclusions. Alternatively, if they were using "street prices" I'd also have to know the state of their local market (wherever that happens to be) at the time of writing (whenever that happens to be), then work out the conversion from their market to mine.

chuanli11 t1_ivhlelj wrote on November 8, 2022 at 12:50 AM

We used the recommended retail price from NVIDIA but OMG they are expensive on the street Lol

learn-deeply t1_iv3tuol wrote on November 5, 2022 at 1:57 AM

Thanks for creating the benchmark!

FYI these results aren't exactly accurate because CUDA 12 supporting Hopper architecture isn't out yet, so none of the fp8 cores are being used and its not taking advantage of optimizations specific to Hopper. From the Nvidia whitepaper:

> With the new FP8 format, the GeForce RTX 4090 delivers 1.3 PetaFLOPS of performance for AI inference workloads.

CUDA 12 will be released some time in 2023, whenever they start delivering H100 GPUs, and it'll take some time for frameworks to add support.

Also the multi-GPU test is lacking some details that would be really helpful to know - how many lanes of PCIe4 is each GPU using? Is the test doing model parallel or data parallel?

Flag_Red t1_iv4h3kz wrote on November 5, 2022 at 5:49 AM

I'm super hyped for fp8 support in CUDA. Combined with some other techniques it could put LLM inference (GPT-175B, for example) in reach of consumer hardware.

whata_wonderful_day t1_iv57znb wrote on November 5, 2022 at 12:10 PM

Performance will definitely get better as time goes, but fp8 is going to be extra work to use, just like fp16.

chuanli11 t1_ivhkx9p wrote on November 8, 2022 at 12:46 AM

Hey, Thanks for the comment. We made sure each GPU uses x16 PCIe 4.0 lanes. It is data parallel (PyTorch DDP specifically).

We look forward to the FP8/CUDA 12 update too.

learn-deeply t1_ivimog5 wrote on November 8, 2022 at 6:00 AM

Thanks for the additional details.

onyx-zero-software t1_iv57n5r wrote on November 5, 2022 at 12:06 PM

> In summary, the GeForce RTX 4090 is a great card for deep learning, particularly for budget-conscious creators, students, and researchers.

Lol what?

husmen93 t1_iva9gsq wrote on November 6, 2022 at 2:22 PM

>students

So as a student in Finland, I can choose between paying 6 months of my rent or buying an RTX 4090 xD

wen_mars t1_iv4savp wrote on November 5, 2022 at 8:34 AM

> The reference prices for RTX 3090 and RTX 4090 are $1400 and $1599, respectively.

Use realistic prices and the results look very different.

> Depending on the model, its TF32 training throughput is between 1.3x to 1.9x higher than RTX 3090. > Similarly, RTX 4090's FP16 training throughput is between 1.3x to 1.8x higher than RTX 3090.

whata_wonderful_day t1_iv20f1u wrote on November 4, 2022 at 6:01 PM

Awesome, much appreciate the detailed benchmarks! The dual GPU scaling in particular was of interest to me. I was wondering how the lack of nvlink would affect things.

BERT large benchmarks would also be great, if you could do them?

chuanli11 t1_ivhsvnf wrote on November 8, 2022 at 1:46 AM

BERT large did scales less well for 2x4090. You can find the exact numbers here:
https://github.com/lambdal/deeplearning-benchmark/blob/22.09-py3/pytorch/pytorch-train-throughput-fp32.csv
https://github.com/lambdal/deeplearning-benchmark/blob/22.09-py3/pytorch/pytorch-train-throughput-fp16.csv

whata_wonderful_day t1_ivjbsv1 wrote on November 8, 2022 at 11:50 AM

Thanks! Good to see a 78% bump in performance with 1 GPU at least

killver t1_iv2swqz wrote on November 4, 2022 at 9:12 PM

Thanks for that - unfortunately it confirms that it is performing worse than many were hoping.

MrAcurite t1_iv3kj8y wrote on November 5, 2022 at 12:41 AM

Between the less than advertised performance and the stories about literally melting the power cables, I think I'm gonna stick with my underclocked eBay 3090 for the time being.

MisterManuscript t1_iv460q1 wrote on November 5, 2022 at 3:46 AM

That extra money spent on the RTX 4090 is gonna go to waste since you're not using the the built-in optical-flow accelerators for training; they're designed and optimized specifically, and only to efficiently compute optical-flow fields for games. Better off sticking to RTX 3090.

wen_mars t1_iv4s1ic wrote on November 5, 2022 at 8:30 AM

The optical flow processors are only a small part of the chip. It has way higher tensor compute available to AI. The real weakness is the limited memory bandwidth.

Krokodeale t1_iv3cv96 wrote on November 4, 2022 at 11:41 PM

I'm really curious where the 4080 16gb will sit, since it's supposed to be couple of hundreds more than the 3090

husmen93 t1_iv7v18k wrote on November 5, 2022 at 11:50 PM

>I'm really curious where the 4080 16gb will sit, since it's supposed to be couple of hundreds more than the 3090

And the effect of a shorter memory bus on the new models in general, after all last gen cards were mostly memory bandwidth bottlenecked.

Krokodeale t1_iv7zv22 wrote on November 6, 2022 at 12:26 AM

So the 3090ti over the 4080 would be a better choice anyway ?

husmen93 t1_iva8w7u wrote on November 6, 2022 at 2:18 PM

I think that's very likely to be the case but real world results remain to be seen, NVIDIA might have done other optimizations after all.

I am interested in a more "budget friendly" comparison between the upcoming RTX 4070 and RTX 3080 12GB, same memory size and potentially similar pricing. But the 4070 has 30% more compute while the 3080 has 80% more bandwidth.

The numbers are from here: RTX 3080 12GB | RTX 4070

serge_cell t1_ive6evf wrote on November 7, 2022 at 8:48 AM

Rule of the thumb - bandwidth win.

tysam_and_co t1_iv3qkvj wrote on November 5, 2022 at 1:30 AM

Thanks for the comparisons. Multi-GPU is always an interesting one. Hopefully they get things ironed out, there's things on this architecture that I really do like a lot :)

Fleischhauf t1_iyam8rb wrote on November 29, 2022 at 11:52 PM

how does the 3090 ti perform in comparison? is it more in the middle or rather similar to the 3090? Im trying to decide between a 4090 amd 3090 ti right now..

danielfm123 t1_iv22wjz wrote on November 4, 2022 at 6:18 PM

I just want to play games

The-Protomolecule t1_iv2kznh wrote on November 4, 2022 at 8:18 PM

You’re in the wrong subreddit.

danielfm123 t1_iv2m5zt wrote on November 4, 2022 at 8:26 PM

true, but ml is not only about neural network.

The-Protomolecule t1_iv2mond wrote on November 4, 2022 at 8:29 PM

Ok. What does that have to do with your comment about gaming, and being in the wrong subreddit?

And it’s a DEEP learning benchmark, which is a heavy focus on neural networks.

computing_professor t1_iv27nxd wrote on November 4, 2022 at 6:49 PM

There are a number of benchmarks out there already for that