Viewing a single comment thread. View all comments

Glycerine t1_j2ddrg8 wrote

This went viral pretty quickly. I'm pretty sure that was posted on reddit only a few days ago about going open source with the project: https://github.com/lucidrains/PaLM-rlhf-pytorch

https://old.reddit.com/r/artificial/comments/zy6swx/palm_with_rlhf_is_now_opensource/

I starred it this week at ~50stars, now it's 3.3k


It looks really exciting, but yes it's not easy to run. Knowing I'm underpowered for most ML work I still gave it a shot on my AMD 4.0Ghz - 32GB ram - 1080GTX.

The moment I knew it was out of reach to process wikipedia:

training:   0% 36/100000 [1:01:05<2581:58:40, 92.98s/it]training loss: 2.976300001144409

That shows it took 1 hour to reach epoch 36 (of 100K). Which estimates about 3 months (24/7) of training...


Secondly it's not built for daily driving yet, the source is still in dev mode and needs a intermediate python dev to execute it - just due to the implementation after the training step.


It would be fun to have a slow input system, or some documentation on how to load super thin datasets as an example. A finished model I can run immediately would be awesome - but I guess that's what the other team are doing.


The future of talky speaky machines is getting very exiting; I can't wait to see what happens two more papers down the line... I'm 101% looking forward to my speaky toaster!!!

42

comefromspace t1_j2dgqhz wrote

> The moment I knew it was out of reach to process wikipedia:

training:   0%| 274/100000 [10:06<55:51:29,  2.02s/it] training loss: 1.4352326393127441

on GTX1650

27

Disastrous_Elk_6375 t1_j2e6d6d wrote

> 92.98s/it

Are your CPUs fully used when training? You might want to check if this is running on GPU or not, those numbers are generally found on CPU training.

10

Glycerine t1_j2frh3o wrote

You're right it's poor. All 8 CPU's hit 100%.


As an update though:

I made a bunch of changes and reduces the dataset to 5 lines from wikipedia; reduced the PaLM size to about 25% of the original, and reduced the epoch times to 8.

It's phenomenal. Within < 30 minutes and a bunch of poking it can easily generate sensible sentences.


I dropped it onto lambda GPU A100 instance - it's silly fast

Edit:

As an example; I trained the model on 5 sentences, with a optimal length of ~128 chars. I ask for a word and see what it constructs.

The goal here is to see if it produces sensible sentences from real words:

With a known word the response is fairly stable:

 qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example', 20)
'example, wrote of vi'
&gt;&gt;&gt; qu('example', 10)
'example, w'
&gt;&gt;&gt; qu('example', 50)
'example, wrote of violence as a necessary and some'

untrained words produce some interesting results. Prior to the <100 epochs of training it was saying nonsense:

tensor(0.0431, grad_fn=&lt;NllLoss2DBackward0&gt;)
&gt;&gt;&gt; qu('when')
'whent he wher a arevo-pociaty on indiviolent resis'
&gt;&gt;&gt; qu('when')
'whent he refuted Nechaev).  Other anarchists, some'
&gt;&gt;&gt; qu('but')
'but. how a free society might be brought about.  H'
&gt;&gt;&gt; qu('but')
'but.  The there is also ofowerat; there is no [[co'
5

Disastrous_Elk_6375 t1_j2ft5zo wrote

> You're right it's poor. All 8 CPU's hit 100%.

Yeah, you're probably not using the gpu. Make sure that your pytorch & cuda stuff are compatible and properly installed. To test, go into a python session, and do


torch.cuda.is_available()

If the output is false it will train on CPU.

7