Comments

You must log in or register to comment.

currentscurrents t1_j2cm36p wrote

TL;DR they want to take another language model (Google’s PaLM) and do Reinforcement Learning with Human Feedback (RLHF) on it like OpenAI did for ChatGPT.

At this point they haven't actually done it yet, since they need both compute power and human volunteers to do the training:

>Human volunteers will be employed to rank those responses from best to worst, using the rankings to create a reward model that takes the original model’s responses and sorts them in order of preference, filtering for the top answers to a given prompt.

>However, the process of aligning this model with what users want to accomplish with ChatGPT is both costly and time-consuming, as PaLM has a massive 540 billion parameters. Note that the cost of developing a text-generating model with only 1.5 billion parameters can reach up to $1.6 million.

Since it has 540b parameters, you will still need a GPU cluster to run it.

81

PrinceOfLies0 t1_j2d5h95 wrote

If the cost of efficiently running a trained LLM locally comes down to ~ 100k, it would probably be a worthwhile investement for me. Definitely something to look out for and potentially contribute. Exciting times :)

2

3deal t1_j2d5iuj wrote

System requierment : 4x RTX 4090

16

Glycerine t1_j2ddrg8 wrote

This went viral pretty quickly. I'm pretty sure that was posted on reddit only a few days ago about going open source with the project: https://github.com/lucidrains/PaLM-rlhf-pytorch

https://old.reddit.com/r/artificial/comments/zy6swx/palm_with_rlhf_is_now_opensource/

I starred it this week at ~50stars, now it's 3.3k


It looks really exciting, but yes it's not easy to run. Knowing I'm underpowered for most ML work I still gave it a shot on my AMD 4.0Ghz - 32GB ram - 1080GTX.

The moment I knew it was out of reach to process wikipedia:

training:   0% 36/100000 [1:01:05<2581:58:40, 92.98s/it]training loss: 2.976300001144409

That shows it took 1 hour to reach epoch 36 (of 100K). Which estimates about 3 months (24/7) of training...


Secondly it's not built for daily driving yet, the source is still in dev mode and needs a intermediate python dev to execute it - just due to the implementation after the training step.


It would be fun to have a slow input system, or some documentation on how to load super thin datasets as an example. A finished model I can run immediately would be awesome - but I guess that's what the other team are doing.


The future of talky speaky machines is getting very exiting; I can't wait to see what happens two more papers down the line... I'm 101% looking forward to my speaky toaster!!!

42

sheeplearning t1_j2dz5eg wrote

I don't believe this sub allowed clickbait content like this. mods?

10

Mikatron3000 t1_j2e07hz wrote

Now if only this was distributed in some decentralized fashion.

3

IdainaKatarite t1_j2eew0u wrote

Serious answer, you could look into getting compute power from CoreWeave. Obviously, something like this is crucial to humanity. (Compare Stable Diffusion to everything else). If we let Big Tech control AI Alignment, then very soon its version of reality will be the dominant one (this is the Letter Agencies' wet dream).

This could be potentially be one of our most important battles of our life time.

4

Ok_Reference_7489 t1_j2ehw9g wrote

LucidDrains "implements" all kinds of a papers. He has more than 200 such repos. But, as far as I know, he never actually tries to reproduce the results in the paper or run at any kind of scale. Note, that in the readme he points people to other projects.

25

Ronny_Jotten t1_j2esha7 wrote

This is clickbait, there's nothing to see here. Wang, among others, has been working on putting together some code as a kind of "proof of concept" that could do RLHF on top of PaLM. Actually doing that on the scale of ChatGPT, i.e. implementing a large, trained, working system, is a completely different story.

The readme includes this:

> This repository has gone viral without my permission. Next time, if you are promoting my unfinished repositories (notice the work in progress flag) for twitter engagement or eyeballs, at least (1) do your research or (2) be totally transparent with your readers about the capacity of the repository without resorting to clickbait. (1) I was not the first, CarperAI had been working on RLHF months before, link below. (2) There is no trained model. This is just the ship and overall map. We still need millions of dollars of compute + data to sail to the correct point in high dimensional parameter space. Even then, you need professional sailors (like Robin Rombach of Stable Diffusion fame) to actually guide the ship through turbulent times to that point.

34

lucidraisin t1_j2exfpq wrote

my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.

but still it is deceptive to the average person to tell them that chatgpt replication is imminent. good code is just a prerequisite to begin the journey. it will take data, compute, adventurers to actually set sail, and in the case of chatgpt, a complicated process of gathering human feedback (I will do my best to lower the activation energy by building a simple and concise app that covers all cases, assuming RLHF does not get outdated by another technique)

29

EthansWay007 t1_j2eykpr wrote

I’m new to the world of programing (mostly Python) what does this becoming open source mean? You can view the API to it?

−1

legocuber t1_j2eyv3m wrote

This is kind of clickbait. Cool that they reproduced some of it, but 90% of that had existed since OpenAI released the source code for InstructGPT's architecture. The real limitation is data and compute, which this repo doesn't really provide... What is really required is a huge open-source RLHF dataset (like ImageNet but for human instructions)

3

rawzone t1_j2f45tb wrote

wth.!? A pretty decent written short "mainstream" article on a pretty complex tech subject.

−1

Ronny_Jotten t1_j2fcaq1 wrote

> my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.

Sure, but I didn't say anything about your other repositories. I said that this particular repository is a proof of concept, in the sense that it demonstrates working code that could serve in the development of a future open-source ChatGPT-like system, but such a system, as you say, is not imminent. It's great that you're working towards it though!

10

Jean-Porte t1_j2fcs81 wrote

Mom, can we have chatGPT ?

No we have chatGPT at home

ChatGPT at home:

3

Glycerine t1_j2frh3o wrote

You're right it's poor. All 8 CPU's hit 100%.


As an update though:

I made a bunch of changes and reduces the dataset to 5 lines from wikipedia; reduced the PaLM size to about 25% of the original, and reduced the epoch times to 8.

It's phenomenal. Within < 30 minutes and a bunch of poking it can easily generate sensible sentences.


I dropped it onto lambda GPU A100 instance - it's silly fast

Edit:

As an example; I trained the model on 5 sentences, with a optimal length of ~128 chars. I ask for a word and see what it constructs.

The goal here is to see if it produces sensible sentences from real words:

With a known word the response is fairly stable:

 qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example', 20)
'example, wrote of vi'
&gt;&gt;&gt; qu('example', 10)
'example, w'
&gt;&gt;&gt; qu('example', 50)
'example, wrote of violence as a necessary and some'

untrained words produce some interesting results. Prior to the <100 epochs of training it was saying nonsense:

tensor(0.0431, grad_fn=&lt;NllLoss2DBackward0&gt;)
&gt;&gt;&gt; qu('when')
'whent he wher a arevo-pociaty on indiviolent resis'
&gt;&gt;&gt; qu('when')
'whent he refuted Nechaev).  Other anarchists, some'
&gt;&gt;&gt; qu('but')
'but. how a free society might be brought about.  H'
&gt;&gt;&gt; qu('but')
'but.  The there is also ofowerat; there is no [[co'
5

Disastrous_Elk_6375 t1_j2ft5zo wrote

> You're right it's poor. All 8 CPU's hit 100%.

Yeah, you're probably not using the gpu. Make sure that your pytorch & cuda stuff are compatible and properly installed. To test, go into a python session, and do


torch.cuda.is_available()

If the output is false it will train on CPU.

7