Submitted by rustymonster2000 t3_11w8lp2 in MachineLearning
Civil_Collection7267 t1_jcx9jri wrote
LLaMA 13B/30B and LLaMA 7B with the Alpaca LoRA are the best that can be run locally on consumer hardware. LLaMA 65B exists but I wouldn't count that as something that can be run locally by most people.
From my own testing, the 7B model with the LoRA is comparable to 13B in coherency, and it's generally better than the recently released OpenAssistant model. If you'd like to see some examples, I answered many prompts in a r/singularity AMA for Alpaca. Go to this post and sort by new to see the responses. I continued where the OP left off.
kross00 t1_jczd3i2 wrote
I’m having a hard time understanding what LoRA is and why it makes the 7B model better? I thought it only improves hardware requirements, but it also improves model coherency? This is all new for me
ericflo t1_jczqkmj wrote
LoRA is how you train llama into alpaca on consumer hardware
nolimyn t1_jd01nm3 wrote
the LoRA is like a modular refinement of the base language model, in this case it's the part that makes it feel like a chatbot / assistant, and makes it follow instructions.
you can see the same concept over at civitai.com, filter by LoRAs. Something like a LoRA for one character can be run on different checkpoints that focus on photorealism or anime, etc.
tungns91 t1_jcxkd5z wrote
Do you have specific chart between consumer hardware and performance of LLaMA 7B to 65B? Like I want to know if my poor gaming PC could have an response in under 1 minute?
Civil_Collection7267 t1_jczrmem wrote
Tom's Hardware has an article on that: https://www.tomshardware.com/news/running-your-own-chatbot-on-a-single-gpu
Viewing a single comment thread. View all comments