baffo32 t1_jcronvh wrote on March 19, 2023 at 1:08 AM

Reply to comment by Meddhouib10 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere

- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)

- pruning (removing parts of the model that didn’t end up impacting outputs after training)

- further quantization below 4 bits

- distilling to a mixture of experts?

- factoring and distilling parts out into heuristic algorithms?

- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small

EDIT:

- numerous techniques published in papers over the past few years

- distilling into an architecture not limited by e.g. a constraint of being feed forward

Art10001 t1_jcwfyw8 wrote on March 20, 2023 at 2:04 AM

I heard MoE is bad. I have no sources sadly.

baffo32 t1_jcxqr2i wrote on March 20, 2023 at 11:21 AM

i visited cvpr last year and people were saying that moe was what mostly was being used; i haven’t tried these things myself though