Civil_Collection7267 t1_jcx9jri wrote on March 20, 2023 at 7:19 AM

LLaMA 13B/30B and LLaMA 7B with the Alpaca LoRA are the best that can be run locally on consumer hardware. LLaMA 65B exists but I wouldn't count that as something that can be run locally by most people.

From my own testing, the 7B model with the LoRA is comparable to 13B in coherency, and it's generally better than the recently released OpenAssistant model. If you'd like to see some examples, I answered many prompts in a r/singularity AMA for Alpaca. Go to this post and sort by new to see the responses. I continued where the OP left off.

kross00 t1_jczd3i2 wrote on March 20, 2023 at 6:36 PM

I’m having a hard time understanding what LoRA is and why it makes the 7B model better? I thought it only improves hardware requirements, but it also improves model coherency? This is all new for me

ericflo t1_jczqkmj wrote on March 20, 2023 at 8:03 PM

LoRA is how you train llama into alpaca on consumer hardware

nolimyn t1_jd01nm3 wrote on March 20, 2023 at 9:14 PM

the LoRA is like a modular refinement of the base language model, in this case it's the part that makes it feel like a chatbot / assistant, and makes it follow instructions.

you can see the same concept over at civitai.com, filter by LoRAs. Something like a LoRA for one character can be run on different checkpoints that focus on photorealism or anime, etc.

tungns91 t1_jcxkd5z wrote on March 20, 2023 at 10:00 AM

Do you have specific chart between consumer hardware and performance of LLaMA 7B to 65B? Like I want to know if my poor gaming PC could have an response in under 1 minute?

Civil_Collection7267 t1_jczrmem wrote on March 20, 2023 at 8:10 PM

Tom's Hardware has an article on that: https://www.tomshardware.com/news/running-your-own-chatbot-on-a-single-gpu