Anjz t1_jc758w9 wrote
Blows my mind, they used a large language model to train a small one.
>Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s, which costs less than $100 on most cloud compute providers.
Now imagine what's possible with GPT-4 training a smaller language model and a bigger instruction sample with corporate backing to use hundreds of A100's at the same time for days at a time?
We're already in reach of exponential growth for low powered devices, it's not going to take years like people have predicted.
Viewing a single comment thread. View all comments