ortegaalfredo
ortegaalfredo t1_je5urre wrote
Reply to [D] llama 7b vs 65b ? by deck4242
I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.
Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.
ortegaalfredo OP t1_jbov7dl wrote
Reply to comment by SpaceCockatoo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Tried the 8bit, 4bit for some reason don't work yet for me.
Problem is, those are very very slow, about 1 token/sec, compared with 13B I'm getting 100 tokens/s
ortegaalfredo OP t1_jbi81mn wrote
Reply to comment by ReginaldIII in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
I posted the github repo in the original post. The output is bad because Meta's original generator is quite bad. I upgraded it today and its much better now. Still not chatgpt.
ortegaalfredo OP t1_jbat4qi wrote
Reply to comment by blablanonymous in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Just joking, even unbounded, LLaMA its actually more restrained than the original Bing or jailbroken ChatGPT.
ortegaalfredo OP t1_jbaswga wrote
Reply to comment by polawiaczperel in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Interesting, will research more into that code, its exactly what I need to run 33B.
Currently using a single card it's still too slow to use it as a chatbot.
ortegaalfredo OP t1_jbaaqv5 wrote
Reply to comment by SrPeixinho in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.
Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.
ortegaalfredo OP t1_jbaadnz wrote
Reply to comment by phamtuanminhmeo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Yes, you can send raw prompts using 'raw' like this:
​
'@ BasedGPT raw The recipe of a chocolate cake is'
This will send whatever you write raw, without any wrapping or added text. But you have to write the prompt as a continuation like every other LLM before ChatGPT.
ortegaalfredo OP t1_jb8ksmz wrote
Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
And here is the discord invite (dont know if mods will remove this: https://discord.gg/ry4cNFwN)
ortegaalfredo OP t1_jb8kdzj wrote
Here are the instructions, you need a discord account, that's it. No limits on what you can ask it, nor rules. Please behave as any spam will need to be removed:
https://twitter.com/ortegaalfredo/status/1632903130416308229
Code for the bot is here:
https://github.com/ortegaalfredo/celery-ai/blob/main/discord/bot.py
Submitted by ortegaalfredo t3_11kr20f in MachineLearning
ortegaalfredo t1_jegn9zu wrote
Reply to comment by machineko in [D] llama 7b vs 65b ? by deck4242
2x3090, 65B is using int4, 30B is using int8 (required for LoRA)