ortegaalfredo t1_jegn9zu wrote on March 31, 2023 at 9:42 PM

Reply to comment by machineko in [D] llama 7b vs 65b ? by deck4242

2x3090, 65B is using int4, 30B is using int8 (required for LoRA)

ortegaalfredo t1_je5urre wrote on March 29, 2023 at 5:03 PM

Reply to [D] llama 7b vs 65b ? by deck4242

I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.

Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.

ortegaalfredo OP t1_jbov7dl wrote on March 10, 2023 at 4:23 PM

Reply to comment by SpaceCockatoo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Tried the 8bit, 4bit for some reason don't work yet for me.

Problem is, those are very very slow, about 1 token/sec, compared with 13B I'm getting 100 tokens/s

ortegaalfredo OP t1_jbi81mn wrote on March 9, 2023 at 5:42 AM

Reply to comment by ReginaldIII in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

I posted the github repo in the original post. The output is bad because Meta's original generator is quite bad. I upgraded it today and its much better now. Still not chatgpt.

ortegaalfredo OP t1_jbat4qi wrote on March 7, 2023 at 6:35 PM

Reply to comment by blablanonymous in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Just joking, even unbounded, LLaMA its actually more restrained than the original Bing or jailbroken ChatGPT.

ortegaalfredo OP t1_jbaswga wrote on March 7, 2023 at 6:34 PM

Reply to comment by polawiaczperel in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Interesting, will research more into that code, its exactly what I need to run 33B.

Currently using a single card it's still too slow to use it as a chatbot.

ortegaalfredo OP t1_jbaaqv5 wrote on March 7, 2023 at 4:38 PM

Reply to comment by SrPeixinho in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.

Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.

ortegaalfredo OP t1_jbaadnz wrote on March 7, 2023 at 4:35 PM

Reply to comment by phamtuanminhmeo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Yes, you can send raw prompts using 'raw' like this:

'@ BasedGPT raw The recipe of a chocolate cake is'

This will send whatever you write raw, without any wrapping or added text. But you have to write the prompt as a continuation like every other LLM before ChatGPT.

ortegaalfredo OP t1_jb8ksmz wrote on March 7, 2023 at 6:09 AM

Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

And here is the discord invite (dont know if mods will remove this: https://discord.gg/ry4cNFwN)

ortegaalfredo OP t1_jb8kdzj wrote on March 7, 2023 at 6:04 AM

Reply to [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Here are the instructions, you need a discord account, that's it. No limits on what you can ask it, nor rules. Please behave as any spam will need to be removed:

https://twitter.com/ortegaalfredo/status/1632903130416308229

Code for the bot is here:

https://github.com/ortegaalfredo/celery-ai/blob/main/discord/bot.py