Submitted by korec1234 t3_11t1857 in MachineLearning
We release the code to reproduce the pre-training of a "Large Language Model" (T5) under a limited budget (1xA100 GPU, ~20 hours) in PyTorch. We start from the randomly initialised T5-base-v1.1 (248M parameters) model implemented in HuggingFace. Next, we pre-train it on the English subset of the C4 dataset and then fine-tune it on Super-Natural Instructions (SNI).
In ~20 hours on a single GPU, we achieve ~40 RougeL on the SNI test set, compared to ~42 RougeL of the original model available on HuggingFace Hub and pre-trained through "a combination of model and data parallelism [...] on slices of Cloud TPU Pods", each with 1024 TPUs.
Our core contribution is not the T5 model itself, which follows the HuggingFace implementation. Instead, we optimise everything else in the training pipeline to offer you a user-friendly starting template for your NLP application/research.
We are keen to hear your suggestions to improve the codebase further.
​
Github: https://github.com/PiotrNawrot/nanoT5
Twitter: https://twitter.com/p_nawrot/status/1636373725397520384
​
learn-deeply t1_jchhzqo wrote
The value that nanoGPT offers is that it is a self-contained (minimal dependencies), easy to understand code. This repo is essentially a wrapper for huggingface's models, dataset and accelerator, which is not very useful for didactic purposes.