Disastrous_Elk_6375 t1_jc3e9ao wrote on March 13, 2023 at 7:20 PM

Reply to comment by luaks1337 in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

With 8-bit this should fit on a 3060 12GB, which is pretty affordable right now. If this works as well as they state it's going to be amazing.

atlast_a_redditor t1_jc3jzcf wrote on March 13, 2023 at 7:57 PM

I know nothing about these stuff, but I'll rather want the 4-bit 13B model for my 3060 12GB. As I've read somewhere quantisation has less effect on larger models.

disgruntled_pie t1_jc4ffo1 wrote on March 13, 2023 at 11:29 PM

I’ve successfully run the 13B parameter version of Llama on my 2080TI (11GB of VRAM) in 4-bit mode and performance was pretty good.

pilibitti t1_jc56vv5 wrote on March 14, 2023 at 2:53 AM

hey do you have a link for how one might set this up?

disgruntled_pie t1_jc5g6or wrote on March 14, 2023 at 4:15 AM

I’m using this project: https://github.com/oobabooga/text-generation-webui

The project’s Github wiki has a page on llama that explains everything you need.

pdaddyo t1_jc5uoly wrote on March 14, 2023 at 7:08 AM

And if you get stuck check out /r/oobabooga

sneakpeekbot t1_jc5upgp wrote on March 14, 2023 at 7:08 AM

Here's a sneak peek of /r/Oobabooga using the top posts of all time!

#1: The new streaming algorithm has been merged. It's a lot faster! | 6 comments
#2: Text streaming will become 1000000x faster tomorrow
#3: LLaMA tutorial (including 4-bit mode) | 10 comments

pilibitti t1_jc5was5 wrote on March 14, 2023 at 7:30 AM

thank you!