Submitted by TheButteryNoodle t3_zau0uc in deeplearning
DingWrong t1_iyq0nr0 wrote
Reply to comment by computing_professor in GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s by TheButteryNoodle
Big models get sharded and chunks get loaded on each gpu. There are a lot of frameworks ready for this as the big NLP models can't fit on a single gpu. Alpa even shards the model on different machines.
computing_professor t1_iyqaku8 wrote
Thanks. So it really isn't the same as how the Quadro cards share vram. That's really confusing.
Viewing a single comment thread. View all comments