Edit: Found LAION-AI/OPEN-ASSISTANT a very promising project opensourcing the idea of chatGPT. video here

TL;DR: I found GPU compute to be generally cheap and spot or on-demand instances can be launched on AWS for a few USD / hour up to over 100GB vRAM. So I thought it would make sense to run your own SOTA LLM like Bloomz 176B inference endpoint whenever you need it for a few questions to answer. I thought it would still make more sense than shoving money into a closed walled garden like "not-so-OpenAi" when they make ChatGPT or GPT-4 available for $$$. But I struggle due to lack of tutorials/resources.

Therefore, I carefully checked benchmarks, model parameters and sizes as well as training sources for all SOTA LLMs here.

Knowing since reading the Chinchilla paper that Model Scaling according to OpenAI was wrong and more params != better quality generation. So I was looking for the best performing LLM openly available in terms of quality and broadness to use for multilingual everyday questions/code completion/reasoning similar to what chatGPT provides (minus the fine-tuning for chat-style conversations).

My choice fell on Bloomz (because that handles multi-lingual questions well and has good zero shot performance for instructions and Q&A style text generation. Confusingly Galactica seems to outperform Bloom on several benchmarks. But since Galactica had a very narrow training set only using scientific papers, I guess usage is probably limited for answers on non-scientific topics.

Therefore I tried running the original bloom 176B and alternatively also Bloomz 176B on AWS SageMaker JumpStart, which should be a one click deployment. This fails after 20min. On Azure ML, I tried using DeepSpeed-MII which also supports bloom but also fails due the instance size of max 12GB vRAM I guess.

From my understanding to save costs on inference, it's probably possible to use one or multiple of the following solutions:

Precision: int8 instead of fp16
Microsoft/DeepSpeed-MII for an up 40x reduction on inference cost on Azure, this thing also supports int8 and fp16 bloom out of the box, but it fails on Azure due to instance size.
facebook/xformer not sure, but if I remember correctly this brought inference requirements down to 4GB vRAM for StableDiffusion and DreamBooth fine-tuning to 10GB. No idea if this is usefull for Bloom(z) inference cost reduction though

I have a CompSci background but I am not familiar with most stuff, except that I was running StableDiffusion since day one on my rtx3080 using linux and also doing fine-tuning with DreamBooth. But that was all just following youtube tutorials. I can't find a single post or youtube video of anyone explaining a full BLOOM / Galactica / BLOOMZ inference deployment on cloud platforms like AWS/Azure using one of the optimizations mentioned above, yet alone deployment of the raw model. :(

I still can't figure it out by myself after 3 days.

TL;DR2: Trying to find likeminded people who are interested to run open source SOTA LLMs for when chatGPT will be paid or just for fun.

Any comments, inputs, rants, counter-arguments are welcome.

/end of rant

Comments

You must log in or register to comment.

londons_explorer t1_j1a3zrf wrote on December 22, 2022 at 7:59 PM

I've got a feeling chatGPT benefits massively from it's human-curated finetuning feedback loop.

Thats hard to reproduce without tens of thousands of man-hours upvoting/downvoting/editing the bots responses.

satireplusplus t1_j1afqub wrote on December 22, 2022 at 9:17 PM

This ^^

Compared to GPT3, ChatGPT is a huge step up. There is basically an entire new reward network, as large as the LM, that is able to judge the quality of the answers. See https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagram.svg

That said, I'd welome a community effort to build an open source version of this.

sanman t1_j1b9tun wrote on December 23, 2022 at 12:54 AM

Do we know when ChatGPT itself will cease to be free, or cease to be available to the general public? I kind of like using this thing - I find it really convenient, so I'd like to know when I'm going to lose access to it.

amhotw t1_j1bnmw5 wrote on December 23, 2022 at 2:42 AM

I mean it is pretty cheap. You probably can't spend more than $10/month if it is priced similar to gpt3.

ktpr t1_j1cb1nd wrote on December 23, 2022 at 6:12 AM

I suspect they’ll move towards paid tiers when the popularity goes down. Right now they’re getting a ton of interesting and rich data for free from going viral. But when that eventually fades they’ll want to continue generating some kind of value from it.

EthansWay007 t1_j1w05nk wrote on December 27, 2022 at 8:37 PM

I’m curious, how do they use the data of it being asking questions to improve it? Does it flag questions it couldn’t answer and then the team updates it?

Nextil t1_j1zqxp9 wrote on December 28, 2022 at 4:47 PM

You can rate the responses up or down and provide an "ideal" response.

[deleted] t1_j2305i1 wrote on December 29, 2022 at 7:27 AM

[deleted]

gelukuMLG t1_j23znll wrote on December 29, 2022 at 2:24 PM

I think it saves the highly rated responses and feeds it into a dataset then it uses reinforcement learning by giving a positive reward to them.

ibraheemMmoosa t1_j1bb29r wrote on December 23, 2022 at 1:04 AM

Only the gods at open ai cam know the answer to that.

f10101 t1_j1cm39r wrote on December 23, 2022 at 8:28 AM

Step 1 definitely explains why its responses often feel so similar to SEO waffle-farm content. I had been wondering where that aspect was coming from.

macguyversmusic t1_j2cq1m6 wrote on December 31, 2022 at 7:11 AM

over 42 different transformers in cascade i read.....

maxToTheJ t1_j1c6ut5 wrote on December 23, 2022 at 5:28 AM

Yup. The training techniques have got a lot better since that first GPT-3 paper.

pilibitti t1_j1ai82j wrote on December 22, 2022 at 9:34 PM

it can be crowdsourced once we have something up and running. this stuff will be commoditized eventually.

IWantAGrapeInMyMouth t1_j1b13kx wrote on December 22, 2022 at 11:48 PM

It really does but there’s a point in time where OpenAI is going to want to cash in. Virtually all of their outputs could benefit from utilizing reinforcement learning to improve after the initial training, but we’ve seen how GPT3 and DallE-2 ultimately chose to be shipped as a sort of finished product that gets updates like any shipped app might, with costs attached. I don’t see why ChatGPT will be any different after x amount of time, unless Stable Diffusion is really eating their Dall-E 2 profitability and they need to find new ways of monetization that doesn’t charge the user utilizing ChatGPT

sanman t1_j1bacsy wrote on December 23, 2022 at 12:58 AM

Well, remember when Youtube was totally free without any ads whatsoever? And of course we all wondered how they were going to continue offering their service for free. Then one day the ads crept in, and we knew.

I'm thinking OpenAI hasn't made this thing free just for generosity. They're using us as free beta-testers to shake down the product for them, so that they can iron out the kinks and bugs. Once that process has run its course, they'll just cut off our access and only allow paying customers to use it.

jrkirby t1_j1bnhkx wrote on December 23, 2022 at 2:40 AM

Why do you think they'll make us pay, when they could instead the treasure trove of personal information to sell to advertisers and train the AI to subliminally (or explicitly) advertise to us?

sanman t1_j1bqgvb wrote on December 23, 2022 at 3:04 AM

I wonder if there'll be a new budding industry for SEO with GPT, just like there is for SEO with Google search? I'm not sure how that would work though, since it might be harder to integrate spam/ads into GPT responses.

KimmiG1 t1_j1bf7bb wrote on December 23, 2022 at 1:36 AM

I'm curious if they keep a free version that sneeks inn adds as natural conversations where it fits.

slashtom t1_j1bf92g wrote on December 23, 2022 at 1:37 AM

Well, they're also getting feedback and the model is only being improved by human interaction. I'd bet they still keep a free tier in order to get access to a broader pool and charge companies/people a subscription fee if they want unlimited access or something.

lucidrage t1_j1c5jxp wrote on December 23, 2022 at 5:16 AM

Imagine if chatgpt was ad supported... You just invented a new business model!

harharveryfunny t1_j1d5m40 wrote on December 23, 2022 at 12:40 PM

Yes - not sure if everyone understands this. ChatGPT took GPT 3.5 as a starting point, but then has a reinforcement learning stage on top of that which has aligned it's output to what humans want from a question-answering chat-bot. It's basically the next generation InstructGPT.

https://arxiv.org/abs/2203.02155

From a quick scan of the Bloomz link, that seems to be just an LLM (i.e. more like GPT-3), not an instruction/human aligned chat-bot. There's a huge qualitative difference.

the-z t1_j1b8h6i wrote on December 23, 2022 at 12:44 AM

To be fair, that's roughly how natural minds are trained, too.

meyerhot t1_j1cg6jj wrote on December 23, 2022 at 7:12 AM

Anyone have any ideas about how they assigned rewards? Somehow take the sum of the prob(logits) from each token in the sentence and multiply that by the reward?

maizeq t1_j1cj523 wrote on December 23, 2022 at 7:49 AM

10s of thousands of hours splits across thousands of people does not seem too significant.

x246ab t1_j1utz7f wrote on December 27, 2022 at 3:56 PM

Very true, but it only needs one good data dump hack

step21 t1_j19vr7h wrote on December 22, 2022 at 7:05 PM

‚Should be a one click deployment‘ lol, famous last words

artsybashev t1_j1c7pzh wrote on December 23, 2022 at 5:37 AM

a lot of stuff can be run locally with git clone ... and docer compose up

Jonno_FTW t1_j1ctwja wrote on December 23, 2022 at 10:13 AM

docker run ... in the ideal world, assuming someone made the Docker image properly.

lolorenz t1_j1cztpc wrote on December 23, 2022 at 11:32 AM

Docker compose is a service that allows to controll multiple docker container and handle their interactions. So docker compose up already is in an ideal world :P

Jonno_FTW t1_j1gjx2f wrote on December 24, 2022 at 4:14 AM

We use docker compose so much at work we have alias dc=docker compose on most of our cloud deployments.

ShowerVagina t1_j1d0cf7 wrote on December 23, 2022 at 11:38 AM

To be fair, if you make a good Jupyter notebook, it can be one-click deployment.

JocialSusticeWarrior t1_j1dwsmh wrote on December 23, 2022 at 4:17 PM

Oh my sweet summer child

mo6phr t1_j1e2xzd wrote on December 23, 2022 at 4:58 PM

lol

Charuru t1_j1a8a71 wrote on December 22, 2022 at 8:28 PM

I don't think the quality is usable for most of these open sourced models, really need another generation of improvement.

SirReal14 t1_j1axbn1 wrote on December 22, 2022 at 11:20 PM

Another option is to work with/contribute to a distributed implementation of large language models. The Petals project is running BLOOM over a decentralized network of small workers (min 8GB VRAM requirement)

Soc13In t1_j1bdyxw wrote on December 23, 2022 at 1:27 AM

Can Radeon cards work or is it Nvidia only?

kkchangisin t1_j1bjz3f wrote on December 23, 2022 at 2:13 AM

CUDA only

Soc13In t1_j1byszb wrote on December 23, 2022 at 4:13 AM

expected as much. thanks for the info though.

SirReal14 t1_j1ben2b wrote on December 23, 2022 at 1:32 AM

Not sure, give it a try and find out!

[deleted] t1_j1dp0ph wrote on December 23, 2022 at 3:23 PM

[removed]

coolbreeze770 t1_j1a8mo0 wrote on December 22, 2022 at 8:30 PM

Or just pay .004c per api query? And open AI will allow you to fine tune their model to your own needs

Edit: I dont know the precise cost just pulled that number out of my ass

f10101 t1_j1b1itt wrote on December 22, 2022 at 11:51 PM

> Or just pay .004c per api query?

"average is probably single-digits cents per chat; trying to figure out more precisely and also how we can optimize it"

https://twitter.com/sama/status/1599671496636780546?lang=en

judasblue t1_j1af1ug wrote on December 22, 2022 at 9:12 PM

That's high by an order of mag :)

pilibitti t1_j1aimsd wrote on December 22, 2022 at 9:36 PM

I think they price by generated token in their other products? if so there should be a way to make chatgpt less verbose out of the box.

also this stuff will be a lot more popular than the other products but the hardware power isn't really there for such demand using older prices I assume. So it might be a bit more expensive than their other offerings.

judasblue t1_j1akvas wrote on December 22, 2022 at 9:51 PM

Oh, I was just pointing out that 1000 tokens in their base model for other services is 0.0004, so an order of mag lower than u/coolbreeze770 was guessing. In other words, pretty friggin cheap for most since a rough way to think about it is three tokens equaling two words on average.

edited for clunky wording

f10101 t1_j1cmsob wrote on December 23, 2022 at 8:37 AM

Just in case you miss my other comment - chatgpt seems to actually be particularly expensive to run in comparison to their other apis. Altman says "single digit cents per chat".

caedin8 t1_j1asist wrote on December 22, 2022 at 10:46 PM

As soon as we can fine tune it to our problem space, we are 100% putting it as a help bot in our commercial software. It’s ready, it just needs tuning.

IWantAGrapeInMyMouth t1_j1b1vl8 wrote on December 22, 2022 at 11:54 PM

I imagine there’ll be open source versions of ChatGPT in the near future given it’s wild popularity, I’ll probably just use that for personal projects, and in a business setting I would just have a dedicated model of that open source version running. .004 cents per 1000 tokens (or much less) is a hell of an ask if you’re doing anything where users generate tokens

sanman t1_j1bakvl wrote on December 23, 2022 at 1:00 AM

Open Source is only free when it's running off your own computer. Otherwise, if it's running off some infrastructure, then that has to be paid for - typically with ads or something like that.

IWantAGrapeInMyMouth t1_j1bzofo wrote on December 23, 2022 at 4:21 AM

Usually inference on hugging face for large models is free for individuals making a reasonable amount of API calls as part of their offerings, and I assume an open source version of this would be on there. I realize that it costs money.

Cryptheon t1_j1cq8q3 wrote on December 23, 2022 at 9:24 AM

Hi, I'm a high performance machine learning consultant working on this. I've run BLOOM on a cluster (not exactly aws/azure).

You could, if you have a large enough GPU, run BLOOM on one GPU by running it one layer at a time, this can simply and naively be done using huggingface. I've tested this, for instance, using 4 40GB VRAM NVIDIA A100s (160GB Vram in total). Inference time for 50 tokens still took 40 mins out of the box; using bf16. If you want to bring this down and make it cost effective you need to have at least 8 80GBs A100 (640 GB VRAM). Int8 will slash this requirement by half, however that means sacrificing inference time due to the nature of the int8 method. On top of that, there are still some optimizations on a cluster level you will have to do if you really want to bring that inference time down to a few miliseconds per token generation. This is probably how OpenAI does it; they keep models continuously loaded on their GPUs, with highly optimized methods, so we can all use their models en-masse.

Point being, this is not something trivial to do and will cost money, expertise and time. Besides, BLOOM is not the best model performance wise because it's a multi language model. As others have mentioned, OpenAI's chat-gpt has further been trained using RL (PPO) on data we don't have access to.

Evoke_App t1_j1cwajo wrote on December 23, 2022 at 10:45 AM

>run BLOOM on one GPU by running it one layer at a time, this can simply and naively be done using huggingface
>
>I've tested this, for instance, using 4 40GB VRAM NVIDIA A100s (160GB Vram in total)

Is it possible to also load it one layer at a time using 24x32GB V100s as well? And would that save on costs (compared to 8x80 A100s) without sacrificing throughput too much?

I'd just like to see if this is worth it before delving too deep into it haha.

Cryptheon t1_j1g9v0r wrote on December 24, 2022 at 2:45 AM

You won't need to load it one layer at a time with enough VRAM. 24x32GB V100s should be enough to load the whole model and do inference. The main bottleneck is GPU-GPU communication and the speed of the GPUs for inference.

In theory you can use one 16GB+ GPU and load it one layer at a time, but this will take way too long for generation. During my tests, each layer loading + inference took ~1.2s. BLOOM 175B has 72 ish layers. So just one token prediction can take roughly 1.5 min with this method. That's waaaay too slow.

[deleted] t1_j1gtk8y wrote on December 24, 2022 at 5:55 AM

[removed]

gettheflyoffmycock t1_j1bzvsv wrote on December 23, 2022 at 4:23 AM

I’ve had to deploy a lot of deep learning, there will not be a simple easy slap on deployment of something like this. Furthermore, it is not going to be cheaper. First of all, I’m not sure if it requires a graphics card, but in AWS there is a one hour minimum unless you use a more expensive contract. So when you make a API request, it’s going to charge you the full three dollar minimum or up to $20 depending on what instance you are using.

Furthermore, the cold start time. If you have it shut down when not in use its like at least 5 to 10 minutes for a model of this size to get up and running. The only way this is cost-effective is if it can run on CPU only, it could fit on an extremely cheap or free AWS. But my guess is that models like this are not going to be able to run fast enough to make it worth it with only CPU.

can anyone chime in if state of the art text generation models like this can run on CPU only?

maxToTheJ t1_j1c75bp wrote on December 23, 2022 at 5:31 AM

You are 100% right. However people will do like DALL-E and make a budget mickey mouse version and pretend its the exact same thing without measuring any quantitative metrics between the original implementation and theirs.

gettheflyoffmycock t1_j1chv31 wrote on December 23, 2022 at 7:33 AM

Yeah, funny how many people have been advertising on all the machine learning subreddits their new chat GPT application. Which is funny because Chat GPT doesn’t have a single API yet.

Kinda funny, AI is ending up like drop shipping. the art of advertising shitty AliExpress products as if they’re actually a better product, and then up charge people like 500 or 1000%, then you just order the AliExpress product and have it mailed to their house. It’s like people are doing that with AI now. Just say it’s this or that and then put a super lightweight model like OpenAI Davinci on a free AWS instance and call it chat GPT. Business models built on “If da Vinci charges you four cents per API credit just charge the user eight Cents “ what will they know?

[deleted] t1_j1cxft5 wrote on December 23, 2022 at 11:01 AM

[removed]

mrcschwering t1_j1easrz wrote on December 23, 2022 at 5:51 PM

I have only deployed a few models (smaller BERT-like) and was able to fit some of them into Lambda function (load from S3).

Otherwise, if we don't care about start-up time, a lambda function that starts a spot instance.

race2tb t1_j1bw4zv wrote on December 23, 2022 at 3:50 AM

OpenAi is better off with lower profits and higher engagement since the engagement is what fuels their models progress. I cannot say for sure what they will do, but right now is not the time to be trying to be exclusive. They should work for on some kind of feedback, reputation credit system that lets you earn by helping them fine tune.

kamalilooo t1_j1c3l20 wrote on December 23, 2022 at 4:57 AM

That would be a new era of publishing. A new content ecosystem and a complete redesign of how revenue is shared. Google isn't releasing lambda because they don't have the answer. SEO is on its death bed and no one knows how to make a sustainable ecosystem because the rise of Chatgpt will eliminate most of the current incentives to publish content that will eventually be needed to update the LLMs.

_underlines_ OP t1_j1c8324 wrote on December 23, 2022 at 5:41 AM

brands / advertisers pay money to the LLM platform to run highly targeted ads in the LLM interface (for example chatGPT, lawGPT, medGPT etc.) the LLM platform pays a share of that adrevenue to content creators, that it uses for training and finetuning.

kamalilooo t1_j1ce0ss wrote on December 23, 2022 at 6:46 AM

I think this is the most sensible take I've heard on the future of written content but how feasible do you think it is in terms of computation? Sounds like youd need a whole new artificial intelligence just for ads to pull it off, and then somehow integrate it with the LLM.

Sorry if its stupid I know nothing about AI. I'm a content writer with existential dread and severe whiplash from all this hype.

Ultimately, we need a system to incentivise human writers otherwise I dont see LLMs scaling

_underlines_ OP t1_j1cgmie wrote on December 23, 2022 at 7:17 AM

no need to integrate the ads into the LLM. Just integrate it into the UI that users use to converse with the AI. Between Answers you can either inject ads, or you can alter answers to contain certain brands.

Very unethical, and that's why I hope this becomes detached from big corps like OpenAI that do this behind a locked down API...

kamalilooo t1_j1ckyk7 wrote on December 23, 2022 at 8:13 AM

So OpenAi gets all the revenue from online advertising, and ends up removing the incentive to publish new content, limiting the usefulness of the LLM because it will be 'stuck in time' in a sense.( Not sure if this is a fair assessment )

Do you think the influx of data they get from our interactions with Chatgpt can make up for the existence of human writers updating google ( and the web) with new data/information as it emerges in real life?

How will ai add anything to the conversation if its stuck in time?

PrinceOfLies0 t1_j1amh6l wrote on December 22, 2022 at 10:03 PM

Hey, I would gladly join your effort, I got a similar background and certain concerns regarding the direction of OpenAI in their approach to censorship. Currently still mostly inexperienced with machine learning, with a mediocare understanding of the linear algebra algo's behind it. I intend to use (and currently partially use) ML for image gen, improving formal software verification by possibly generating SMT conditions and such + aiding procedural generation algo's...

I would not be too concerned with ChatGPT costing a bit of money but rather the API or functionality being neutered because "too powerful". As such, I rather have control over the the whole AI stack.

Long term, I would also like to investigate the possibility for massive GPU based distributed training, similar to Folding@home just for generating models.

Discord/ Element/ Telegram - I am free to talk :)

fqrh t1_j1avi8q wrote on December 22, 2022 at 11:07 PM

I have fear of missing out when ChatGPT censors itself. Ideally, if someone pays for a chatbot themselves, they can get uncensored responses from it.

maizeq t1_j1ckd3o wrote on December 23, 2022 at 8:05 AM

I would be interested in helping. (Currently in AI research but not focussed on LLMs).

I don’t like the idea that the user feedback OpenAI is accumulating from ChatGPT is contributing to deepening their moat (I highly doubt they will release all that data publicly).

For a company founded on principles of openness to be working directly against the democratisation of AI, some serious criticism is warranted I think.

I could perhaps understand if there was a need for profitability to ensure the cost of their research, but the models they are commercialising are by and large models based on the research of other labs which are far more open with releasing their work. Their closed approach will simply incentivise and push other research labs to make their research more closed also, further increasing the likelihood of AI being concentrated in the hands of very few.

blose1 t1_j1apar1 wrote on December 22, 2022 at 10:23 PM

Even with int8 you need at least 175 GB of VRAM to run one model instance, time to launch and load it on demand will be higher that using openai api and your performance will be lower. Forget about running current generation of LLMs like OPT/BLOOM in cloud for real world cases, they are crap, I've tested them, they loop all the time and they can't match chatGPT results, you will not get performance of chatGPT from them without human assisted RL step that openai did. So wait for next gen of open source models or just use chatGPT.

Bartmoss t1_j1aqxem wrote on December 22, 2022 at 10:34 PM

I think playing around with a nice encoder-decoder like T5 is a great start. Trying the original model is already nice, the newer flan-t5 can be better for some few shot tasks. The base models are already pretty good. Even the small models perform pretty well. I haven't tried the t5-tiny yet, but it is on my list to play with.

Of course if you have specific tasks in respect to generating texts, you could do some fine-tuning of T5. You can even use the same model for fine-tuning on several tasks with different prompts. I have found that for some tasks (especially where a sequence-to-sequence model have advantages), a fine-tuned T5 (or some variant thereof) can beat a zero, few, or even fine-tuned GPT-3 model.

It can be suprising what such encoder-decoder models can do with prompt prefixes, and few shot learning and can be a good starting point to play with large language models.

cajmorgans t1_j1cj3is wrote on December 23, 2022 at 7:49 AM

We seriously need to create an open source model, it’s important that one company don’t get the whole market share in these powerful tools.

jbreezeai t1_j1cdo0r wrote on December 23, 2022 at 6:42 AM

Im interested. I have played around with gpt models and Bert. Not got into bloomz yet. I have trained gpt3 custom models on openai. My team has worked with lot more.

My concerns with openai is there is not clarity of my data will be reused or adapted into their general models. Second training gpt3 is very cumbersome and not flexible.

Advantage of openai: training the models and deploying it all api based so no infra and devops / mlops overhead.

I think ultimately cost will almost be in parity across all clouds with 10-20% delta. The automation will what be xtra cost. Do you pay openai or aws for automation or hire someone to do it.

pan_berbelek t1_j1cpkth wrote on December 23, 2022 at 9:15 AM

I'm trying to do basically the same thing and yes, running bloom does require a lot of memory. I managed to run it on:

ordinary computer with no GPU and 16GB of RAM, by loading parts of the model (divided to 73 parts) every time for every token. But this is painfully slow: 2-3 minutes per single token produced
a VM in Azure with no GPU but with lots of RAM (600+GB). This can generate a single token in 2-3 seconds, still way too slow for my usecase

Now I'm trying to run on a Azure VM with 8 A100 GPUs, as is recommended by Bloom authors, but this of course is significantly more expensive: the right sized VM costs $35 per hour. From what I read this setup could be capable in generating a single token in less than 1 millisecond, and if this is really true then this means this setup is actually the cheapest one for my usecase, despite high VM cost, but I need to validate first if I can really achieve this speed.

tripple13 t1_j1cu0yj wrote on December 23, 2022 at 10:14 AM

God I love this post.

More genuine passion in this sub, please!

Keep us updated on your progress, would be great to follow.

ShowerVagina t1_j1d0u61 wrote on December 23, 2022 at 11:44 AM

My biggest pet peeve with chatGPT is how sanitized it is. I want a chat bot i can experiment with. I want a chat bot that will argue why the earth should be destroyed by an asteroid. Can SOTA LLM's do that?

In terms of GPU compute, I'd highly recommend Paperspace's $40 a month pro plan. You get access to these GPU's for free and your instances live for up to 6 hours with your files and storage persisting between runs. Though, capacity is limited on higher end GPU's but you can reliably get at least an A5000 at most times. So I'm happy to help with processing power.

yahma t1_j1dulgw wrote on December 23, 2022 at 4:02 PM

Based on my testing, none of the open source models are anywhere near as good as ChatGPT (or even davinci-03 .. the lastest GPT-3 snapshot).

I think open source models need more fine-tuning and some RL techniques applied to get anywhere close.

_underlines_ OP t1_j1m9w8e wrote on December 25, 2022 at 3:42 PM

Update: Found LAION-AI/OPEN-ASSISTANT a very promising project opensourcing the idea of chatGPT. video here

buzzbuzzimafuzz t1_j1nn350 wrote on December 25, 2022 at 10:15 PM

Video unavailable :(

How is Open Assistant trained and how good is it so far?

[deleted] t1_j1by3yo wrote on December 23, 2022 at 4:07 AM

[deleted]

meyerhot t1_j1cfyrj wrote on December 23, 2022 at 7:09 AM

I am really interested in this and have been looking into doing some sort of finetuning on an LLM like GLM or Bloom. I had this idea for human in the loop in grad school but wasn’t able to implement how to assign the rewards to the sentences when the text generation is token by token.

ztapper t1_j1chmq7 wrote on December 23, 2022 at 7:30 AM

I would participate here. We have some use cases in the always on/semi supervised learning space that might be helpful.

rodio346 t1_j1cpn1y wrote on December 23, 2022 at 9:15 AM

Saving this post to come back to it after my exams

Delicious-View-8688 t1_j1ctzdn wrote on December 23, 2022 at 10:14 AM

Isn't Microsoft Azure AI's Davinci sort of what this is?

lolorenz t1_j1d1ud1 wrote on December 23, 2022 at 11:57 AM

Have you had a look into the Huggingface libary? As far as I know you can deploy their model directly from their website onto AWS. https://huggingface.co/bigscience/bloomz

canttouchmypingas t1_j1gq2th wrote on December 24, 2022 at 5:16 AM

Someone will post their implementation on github soon if they haven't already. All we really need is an open source dataset and we'd be good to go. Barrier to entry only being setting up the AWS instance to train your model.

This would allow different communities to develop their own datasets - if programmers pulled together with the ChatGPT hype to make a large programming dataset, we'd have a much more capable github copilot relatively soon.

Just need the open source datasets and an implementation; the later usually comes but the former is elusive.

stevechau95 t1_j1owfj6 wrote on December 26, 2022 at 5:01 AM

Display ChatGPT response alongside Google, Bing, DuckDuckGo Search results. It Free: https://chrome.google.com/webstore/detail/chatgpt-for-search-engine/feeonheemodpkdckaljcjogdncpiiban

BestSentence4868 t1_j1bjak3 wrote on December 23, 2022 at 2:08 AM

run int8 instead of fp16 gg rip

_underlines_ OP t1_j1c8kvn wrote on December 23, 2022 at 5:46 AM

DeepSpeed-MII supports Bloom int8 and fp16.

BestSentence4868 t1_j1cackq wrote on December 23, 2022 at 6:05 AM

is DeepSpeed-MII publically available or only on azure?

_underlines_ OP t1_j1cgoj8 wrote on December 23, 2022 at 7:18 AM

They have two versions: public and azure.

Last-Caterpillar-112 t1_j1cvgx3 wrote on December 23, 2022 at 10:34 AM

So you are saying that when ChatGPT, which we are all perfectly happy with, starts charging a few dollars a month, we or you or someone else should spend a TON of money AND unknown effort to roll their own hastily trained, half-assed LLM in a couple of months with mixed results? And this potential ChatGPT-killer will be altruistic and free forever?

yellowbrowntwo2 t1_j1bpzoe wrote on December 23, 2022 at 3:00 AM

thanks for the information and share, seems that GPT4 (and beyond) and competitors will have the advertisement supported model like current search engines (ex. google)

I am sure everyone will agree that ai will be at 100% in many IQ test as compared in your google docs.

AI has shown worthwhile results on common sense, physical world and reasoning comparable to adult humans presently (2022).

seems that these ai chat engines has less memory