Viewing a single comment thread. View all comments

head_robotics OP t1_j99tts4 wrote on February 20, 2023 at 10:13 AM

Reply to comment by Disastrous_Elk_6375 in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

Did you use something like bitsandbytes for the 8bit inference?

How did you implement it?

https://github.com/TimDettmers/bitsandbytes

Disastrous_Elk_6375 t1_j99ujv1 wrote on February 20, 2023 at 10:24 AM

add this to your .from_pretrained("model" , device_map="auto", load_in_8bit=True)

Transformers does the rest.