Viewing a single comment thread. View all comments

MysteryInc152 t1_jal7d3p wrote

Distillation doesn't work for token predicting language models for some reason.

3

currentscurrents t1_jalajj3 wrote

DistillBERT worked though?

2

MysteryInc152 t1_jalau7e wrote

Sorry i meant the really large scale models. Nobody has gotten a gpt-3/chinchilla etc scale model to actually distill properly.

6