[D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM Submitted by head_robotics t3_1172jrs on February 20, 2023 at 9:33 AM in MachineLearning 51 comments 220
halixness t1_j9e80y1 wrote on February 21, 2023 at 7:22 AM So far I have tried BLOOM Petals (a distributed LLM), inference took me around 30s for a single prompt on a 8GB VRAM gpu, but not bad! Permalink 1
Viewing a single comment thread. View all comments