joossss OP t1_j3qnc04 wrote on January 10, 2023 at 12:14 PM

Reply to comment by learn-deeply in [D] Deep Learning Training Server by joossss

Yeah true and thanks :)

I did not remove it. Was removed by the moderators for some reason.

joossss OP t1_j3q9uip wrote on January 10, 2023 at 9:21 AM

Reply to comment by learn-deeply in [D] Deep Learning Training Server by joossss

Only this server is planned. I just went with the recommendation from NVIDIA's website, which stated 100 Gbps per A100, but I guess it makes more sense now that I think of distributed training. What NIC speed seems enough in that case?

joossss OP t1_j3q9qe0 wrote on January 10, 2023 at 9:19 AM

Reply to comment by Cosmic_peach94 in [D] Deep Learning Training Server by joossss

Thanks for the info! Was thinking on how to do that.

joossss OP t1_j3q9m8r wrote on January 10, 2023 at 9:18 AM

Reply to comment by TrueBirch in [D] Deep Learning Training Server by joossss

The main reason for going to the cloud for us is that we are a research institution so, our funding is project-based meaning we have to use the funding in the allotted time and the second reason is that we already have the GPUs so the time it takes to pay itself off is faster.

joossss OP t1_j3lj1su wrote on January 9, 2023 at 11:53 AM

Reply to comment by Weary-Marionberry-15 in [D] Deep Learning Training Server by joossss

Thanks! The newest Threadrippers are still based on Zen 3. So, they don't support AVX512. Would definitely like to go with A100s, but we don't have the budget for that.