remghoost7
remghoost7 t1_jcq8emm wrote
Reply to comment by LetMeGuessYourAlts in [P] Web Stable Diffusion by crowwork
Ah, it was made with unreal.....? I didn't see that.
I always love adaptations of video game engines. One of the reasons I've been a huge fan of Unity for years. It's essentially just a wrapper for C# code with a pretty interface.
remghoost7 t1_jcnvpqh wrote
Reply to [P] Web Stable Diffusion by crowwork
Very interesting....
Reminds me of how some VR apps can run natively in browsers, using hardware acceleration I believe. I'm guessing this is something sort of similar to that....? Could be entirely wrong though.
Cool stuff though. Would be need to make an extension of this for A1111.... Not to diminish the work you've done, but it would probably get more exposure that way (since it's the most used Stable Diffusion front end out there).
remghoost7 t1_jc0bymy wrote
Reply to comment by toothpastespiders in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
I'm having an issue with the C++ compiler on the last step.
I've been trying to use python 3.10.9 though, so maybe that's my problem....? My venv is set up correctly as well.
Not specifically looking for help.
Apparently this person posted a guide on it in that subreddit. Will report back if I am successful.
edit - Success! But, using WSL instead of Windows (because that was a freaking headache). WSL worked the first time following the instructions on the GitHub page. Would highly recommend using WSL to install it instead of trying to force Windows to figure it out.
remghoost7 t1_jbzro03 wrote
Reply to comment by The_frozen_one in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Nice!
How's the generation speed...?
remghoost7 t1_jbzqf5m wrote
Reply to comment by Amazing_Painter_7692 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Most excellent. Thank you so much! I will look into all of these.
Guess I know what I'm doing for the rest of the day. Time to make more coffee! haha.
You are my new favorite person this week.
Also, one final question, if you will. What's so unique about the 4-bit weights and why would you prefer to run it in that manner? Is it just VRAM optimization requirements....? I'm decently versed in Stable Diffusion, but LLMs are fairly new territory for me.
My question seemed to have been answered here, and it is a VRAM limitation. Also, that last link seems to support 4-bit models as well. Doesn't seem too bad to set up.... Though I installed A1111 when it first came out, so I learned through the garbage of that. Lol. I was wrong. Oh so wrong. haha.
Yet again, thank you for your time and have a wonderful rest of your day. <3
remghoost7 t1_jbzmfku wrote
Reply to comment by Amazing_Painter_7692 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Super neat. Thanks for the reply. I'll try that.
Also, do you know if there's a local interface for it....?
I know it's not quite the scope of the post, but it'd be neat to interact with it through a simple python interface (or something like how Gradio is used for A1111's Stable Diffusion) rather than piping it all through Discord.
remghoost7 t1_jbz96lt wrote
Reply to [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
><9 GiB VRAM
So does that mean my 1060 6GB can run it....? haha.
I doubt it, but I'll give it a shot later just in case.
remghoost7 t1_jd1k0l6 wrote
Reply to comment by wojtek15 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
>...Uniform RAM which can be used by CPU, GPU or Neural Engine.
Interesting....
That's why I've seen so many M1 implementations of machine learning models. It really does seem like the M1 chips were made with AI in mind....