Viewing a single comment thread. View all comments

LoaderD t1_jbw3640 wrote

> In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM.

Ughhh are you telling me I have to SSH into my DGX 100 instead of just using my local machine with 1 A100? (Satire I am a broke student)

Appreciate the implementation and transparency. I don't think many people realize how big a 65B parameter model is since there's no associated cost with downloading them.

17