We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).

In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.

Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research.

We are keen to hear your suggestions to improve the codebase further.

Github: https://github.com/PiotrNawrot/nanoT5

Twitter: https://twitter.com/p_nawrot/status/1636373725397520384

https://preview.redd.it/zluas7u235oa1.png?width=1152&format=png&auto=webp&v=enabled&s=8d642abdce1753841b7fc977a141d0f13ca2b213

Comments

You must log in or register to comment.

learn-deeply t1_jchhzqo wrote on March 16, 2023 at 9:30 PM

The value that nanoGPT offers is that it is a self-contained (minimal dependencies), easy to understand code. This repo is essentially a wrapper for huggingface's models, dataset and accelerator, which is not very useful for didactic purposes.

hosjiu t1_jcjey3z wrote on March 17, 2023 at 7:21 AM

sure, but its main focus is to try to help many people in the academic community out there can do “pretraining phase” by themeself for fast, cheap and reproducible research experiments.

impossiblefork t1_jcgp1dt wrote on March 16, 2023 at 6:25 PM

So this is actually cheap? About 20 USD?

currentscurrents t1_jcgtqwz wrote on March 16, 2023 at 6:54 PM

...for a toy-sized 250M parameter language model, yes.

Dankmemexplorer t1_jchlw3t wrote on March 16, 2023 at 9:56 PM

man its funny that 250M is a toy now

how far we've come...

currentscurrents t1_jchn22q wrote on March 16, 2023 at 10:03 PM

Computers have gotten literally 100 million times faster within my lifetime. I'm not even that old!

RobbinDeBank t1_jciwzek wrote on March 17, 2023 at 3:50 AM

New way of measuring age

saintshing t1_jcjc3zs wrote on March 17, 2023 at 6:41 AM

stolen from vitalik

>70 years is the time between the first computer and modern smart watches.

>70 years is more than the time between the first heavier-than-air flight and landing on the moon.

>70 years is 1.5x the time between the invention of public key cryptorgaphy and modern general-purpose ZK-SNARKs.

Oswald_Hydrabot t1_jci6a41 wrote on March 17, 2023 at 12:18 AM

You don't need a nuclear bomb to hunt elk.

This is a solution you can fully own on top of that.

It has value.

crazymonezyy t1_jcixab6 wrote on March 17, 2023 at 3:53 AM

That's an argument much easily put forth philosophically than to a business head.

Because there's no valid answer to the follow up questions "what if we do?" and "but what if our competition offers it?".

Oswald_Hydrabot t1_jcizf9y wrote on March 17, 2023 at 4:14 AM

"We will use it only when nothing else can solve the problem", I believe is your answer.

There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider. There is a significant amount of risk in integration of a product like GPT-4 as a dependency.

mysteriousbaba t1_jcj9u7q wrote on March 17, 2023 at 6:11 AM

Especially now that OpenAI have stopped publishing details of what goes into their black box. GPT-4 is the first time they haven't revealed details of their training architecture or dataset generation in the technical report.

crazymonezyy t1_jcjfp9o wrote on March 17, 2023 at 7:32 AM

> There are solutions that cost less than GPT-4, and they don't require integration of a black box that is gatekept by a single provider.

Management has a different perspective on costs than you and me. The way cost-benefit is analyzed in a company is whether by increasing the input cost X% can the profit then be increased by a corresponding Y% due to an increase in scale (number of contracts). They are also shit scared of the new guy on the block and losing existing business to the 100 or so startups that will come up over the next week flashing the shiny new thing in front of customers. They also don't have the same perspective on open as us, where they see black boxes as a partnership opportunity.

I'm not saying you're wrong, in fact I agree with your sentiment and it's the same as mine, and I've tried to put forth some of these arguments to my boss for why we should still be building products in-house instead of GPT-everything. What I realised is when you talk to somebody on the business side you'd get a very different response to the ironclad defense that works perfectly in your head.

Oswald_Hydrabot t1_jcpqshf wrote on March 18, 2023 at 4:42 PM

Those are bad managers. I certainly have had these conversations and I left companies over their response until I found one that listened.

You have to try harder. You have to stop accepting short-sighted near term profit as "just how it is" or assuming that financial malpratice at scale is "good business", because if you do not and you don't keep trying, failure is inevitable. Corruption and corporate bailouts that take our tax revenue and cost us layoffs to pay for those mistakes are inevitable. Stop being complacent if you cannot accept putting in the effort to make what you know is right a reality.

I have been involved in those conversations at the highest levels in some of the largest companies in the world. More often than not I told them to either listen to the consulting that they PAID me for, or I will take my business somewhere else, and I did. If you don't suck at what you do then firing bad clients will not hurt you; in fact is it critical to your own growth in your career. You need to treat your employer as a client.

impossiblefork t1_jch7ker wrote on March 16, 2023 at 8:22 PM

Still, probably useful for research-- validating alternatives to transformers, etc.

[deleted] t1_jcj7qap wrote on March 17, 2023 at 5:44 AM

20 dollar models are popping up:

https://www.mosaicml.com/blog/mosaicbert

Holy cow!

Oswald_Hydrabot t1_jci6kf3 wrote on March 17, 2023 at 12:20 AM

This is more interesting than GPT-4 to me, by a great deal. Thank you for sharing!

Optimization and ownership of your full product is important. This is how we combat being locked out of the gated community, providing tangible value through running code.

I am going to check it out this evening!

cathie_burry t1_jcgqrcx wrote on March 16, 2023 at 6:35 PM

What the hell this is awesome

cathie_burry t1_jcgqwkn wrote on March 16, 2023 at 6:36 PM

How does it compare to current large language models in terms of efficacy etc.

Either-Job-341 t1_jcrhmk0 wrote on March 19, 2023 at 12:14 AM

RemindMe! 3 days

RemindMeBot t1_jcrhq4b wrote on March 19, 2023 at 12:14 AM

I will be messaging you in 3 days on 2023-03-22 00:14:14 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

Readorn t1_jdureq0 wrote on March 27, 2023 at 10:38 AM

So like let me get this straight, we can download this repository, train the model, this NanoGPT model and use it?

korec1234 OP t1_je2dtze wrote on March 28, 2023 at 10:22 PM

Exactly, works great :)

Readorn t1_je9653s wrote on March 30, 2023 at 9:48 AM

I havent downloaded it yet, but it makes me wonder. Is it possible to use it with OCR recognision?

[deleted] t1_jedrwpq wrote on March 31, 2023 at 7:59 AM

[deleted]