LetterRip t1_itchnjl wrote
Reply to comment by cygn in [R] Scaling Instruction-Finetuned Language Models - Flan-PaLM- Google 2022 - 75.2% on five-shot MMLU / Forecasters expected this SOTA would need until 2024! - Public checkpoints! by Singularian2501
I assume you mean 24GB of VRAM? Deepspeed with enough CPU RAM and mapping to hard drive as needed, might let you run it. Note that 540B parameters is more than 2 TB for float 32. Even going 8 bit, you are looking at 512 GB. Consumer hardware RAM is typically max 128 GB. So the vast majority of it is going to have to be mapped to the hard drive. The size can probably be reduced a lot using both quantization and compression, but you will either have to do the work yourself or wait till someone else does.
farmingvillein t1_itefjav wrote
> Note that 540B parameters is more than 2 TB for float 32
They only provide checkpoints up to the 11B model, however (unless I'm reading things wrong), so this is a moot point, at the moment.
[deleted] t1_itef881 wrote
[deleted]
Viewing a single comment thread. View all comments