LoaderD t1_jbw3640 wrote
> In reality you can easily fit the 65B version in 2 A100 with 100G of VRAM.
Ughhh are you telling me I have to SSH into my DGX 100 instead of just using my local machine with 1 A100? (Satire I am a broke student)
Appreciate the implementation and transparency. I don't think many people realize how big a 65B parameter model is since there's no associated cost with downloading them.
Viewing a single comment thread. View all comments