Submitted by head_robotics t3_1172jrs in MachineLearning
head_robotics OP t1_j99tts4 wrote
Reply to comment by Disastrous_Elk_6375 in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Did you use something like bitsandbytes for the 8bit inference?
How did you implement it?
Disastrous_Elk_6375 t1_j99ujv1 wrote
add this to your .from_pretrained("model" , device_map="auto", load_in_8bit=True)
Transformers does the rest.
Viewing a single comment thread. View all comments