Submitted by korec1234 t3_11t1857 in MachineLearning

We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).

In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.

Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research.

We are keen to hear your suggestions to improve the codebase further.

​

Github: https://github.com/PiotrNawrot/nanoT5

Twitter: https://twitter.com/p_nawrot/status/1636373725397520384

​

https://preview.redd.it/zluas7u235oa1.png?width=1152&format=png&auto=webp&v=enabled&s=8d642abdce1753841b7fc977a141d0f13ca2b213

258

Comments

You must log in or register to comment.

learn-deeply t1_jchhzqo wrote

The value that nanoGPT offers is that it is a self-contained (minimal dependencies), easy to understand code. This repo is essentially a wrapper for huggingface's models, dataset and accelerator, which is not very useful for didactic purposes.

32

hosjiu t1_jcjey3z wrote

sure, but its main focus is to try to help many people in the academic community out there can do “pretraining phase” by themeself for fast, cheap and reproducible research experiments.

4

impossiblefork t1_jcgp1dt wrote

So this is actually cheap? About 20 USD?

22

currentscurrents t1_jcgtqwz wrote

...for a toy-sized 250M parameter language model, yes.

23

Dankmemexplorer t1_jchlw3t wrote

man its funny that 250M is a toy now

how far we've come...

50

currentscurrents t1_jchn22q wrote

Computers have gotten literally 100 million times faster within my lifetime. I'm not even that old!

23

saintshing t1_jcjc3zs wrote

stolen from vitalik

>70 years is the time between the first computer and modern smart watches.

>70 years is more than the time between the first heavier-than-air flight and landing on the moon.

>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.

2

Oswald_Hydrabot t1_jci6a41 wrote

You don't need a nuclear bomb to hunt elk.

This is a solution you can fully own on top of that.

It has value.

6

crazymonezyy t1_jcixab6 wrote

That's an argument much easily put forth philosophically than to a business head.

Because there's no valid answer to the follow up questions "what if we do?" and "but what if our competition offers it?".

0

Oswald_Hydrabot t1_jcizf9y wrote

"We will use it only when nothing else can solve the problem", I believe is your answer.

There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider. There is a significant amount of risk in integration of a product like GPT-4 as a dependency.

2

mysteriousbaba t1_jcj9u7q wrote

Especially now that OpenAI have stopped publishing details of what goes into their black box. GPT-4 is the first time they haven't revealed details of their training architecture or dataset generation in the technical report.

2

crazymonezyy t1_jcjfp9o wrote

> There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider.

Management has a different perspective on costs than you and me. The way cost-benefit is analyzed in a company is whether by increasing the input cost X% can the profit then be increased by a corresponding Y% due to an increase in scale (number of contracts). They are also shit scared of the new guy on the block and losing existing business to the 100 or so startups that will come up over the next week flashing the shiny new thing in front of customers. They also don't have the same perspective on open as us, where they see black boxes as a partnership opportunity.

I'm not saying you're wrong, in fact I agree with your sentiment and it's the same as mine, and I've tried to put forth some of these arguments to my boss for why we should still be building products in-house instead of GPT-everything. What I realised is when you talk to somebody on the business side you'd get a very different response to the ironclad defense that works perfectly in your head.

1

Oswald_Hydrabot t1_jcpqshf wrote

Those are bad managers. I certainly have had these conversations and I left companies over their response until I found one that listened.

You have to try harder. You have to stop accepting short-sighted near term profit as "just how it is" or assuming that financial malpratice at scale is "good business", because if you do not and you don't keep trying, failure is inevitable. Corruption and corporate bailouts that take our tax revenue and cost us layoffs to pay for those mistakes are inevitable. Stop being complacent if you cannot accept putting in the effort to make what you know is right a reality.

I have been involved in those conversations at the highest levels in some of the largest companies in the world. More often than not I told them to either listen to the consulting that they PAID me for, or I will take my business somewhere else, and I did. If you don't suck at what you do then firing bad clients will not hurt you; in fact is it critical to your own growth in your career. You need to treat your employer as a client.

1

impossiblefork t1_jch7ker wrote

Still, probably useful for research-- validating alternatives to transformers, etc.

5

Oswald_Hydrabot t1_jci6kf3 wrote

This is more interesting than GPT-4 to me, by a great deal. Thank you for sharing!

Optimization and ownership of your full product is important. This is how we combat being locked out of the gated community, providing tangible value through running code.

I am going to check it out this evening!

15

cathie_burry t1_jcgqwkn wrote

How does it compare to current large language models in terms of efficacy etc.

3

Readorn t1_jdureq0 wrote

So like let me get this straight, we can download this repository, train the model, this NanoGPT model and use it?

2

korec1234 OP t1_je2dtze wrote

Exactly, works great :)

1

Readorn t1_je9653s wrote

I havent downloaded it yet, but it makes me wonder. Is it possible to use it with OCR recognision?

1