Submitted by simpleuserhere t3_11usq7o in MachineLearning
baffo32 t1_jcronvh wrote
Reply to comment by Meddhouib10 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)
- pruning (removing parts of the model that didn’t end up impacting outputs after training)
- further quantization below 4 bits
- distilling to a mixture of experts?
- factoring and distilling parts out into heuristic algorithms?
- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small
EDIT:
- numerous techniques published in papers over the past few years
- distilling into an architecture not limited by e.g. a constraint of being feed forward
Viewing a single comment thread. View all comments