Disastrous_Elk_6375 t1_j99ry6s wrote on February 20, 2023 at 9:46 AM

GPT-NeoX should fit in 24GB VRAM with 8bit, for inference.

I managed to run GPT-J 6B on a 3060 w/ 12GB and it takes about 7.2GB of VRAM.

ArmagedonAshhole t1_j99tr0r wrote on February 20, 2023 at 10:12 AM

>GPT-NeoX should fit in 24GB VRAM with 8bit, for inference.

GPT-NeoX20B It will fit in 24GB vram but it will almost instantly go out of memory when context will get a bit bigger than starting page of sentences.

Disastrous_Elk_6375 t1_j99xxfa wrote on February 20, 2023 at 11:11 AM

Are there some rough numbers on prompt size vs. ram usage after the model load? I haven't played yet with GPT-NeoX

ArmagedonAshhole t1_j9a1vq3 wrote on February 20, 2023 at 12:01 PM

it depends mostly on settings so no.

Small context like 200-300 tokens could work with 24GB but then your AI will not remember and connect dots well which would make model worse than 13B

People are working right now on spliting work between gpu(vram) and cpu(ram) in 8bit mode. I think like 10% to RAM would make model work well on 24GB vram card. IT would be a bit slower but still usable.

If you want you can always load whole model to ram and run it via cpu but it is very slow.

Disastrous_Elk_6375 t1_j9a2877 wrote on February 20, 2023 at 12:05 PM

Thanks!

[deleted] t1_j9ati5p wrote on February 20, 2023 at 3:57 PM

[deleted]

head_robotics OP t1_j99tts4 wrote on February 20, 2023 at 10:13 AM

Did you use something like bitsandbytes for the 8bit inference?

How did you implement it?

https://github.com/TimDettmers/bitsandbytes

Disastrous_Elk_6375 t1_j99ujv1 wrote on February 20, 2023 at 10:24 AM

add this to your .from_pretrained("model" , device_map="auto", load_in_8bit=True)

Transformers does the rest.