DingWrong t1_iyq0nr0 wrote on December 3, 2022 at 6:38 AM

Reply to comment by computing_professor in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle

Big models get sharded and chunks get loaded on each gpu. There are a lot of frameworks ready for this as the big NLP models can't fit on a single gpu. Alpa even shards the model on different machines.

computing_professor t1_iyqaku8 wrote on December 3, 2022 at 8:56 AM

Thanks. So it really isn't the same as how the Quadro cards share vram. That's really confusing.