Submitted by starstruckmon t3_1027geh in MachineLearning
matth0x01 t1_j2u5rwm wrote
Reply to comment by bloc97 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Sorry - What's meant by perplexity here?
prototypist t1_j2uskwt wrote
It's a metric comparing the model's generative probabilities / text predictions vs. the actual text.
matth0x01 t1_j2vxl6g wrote
Thanks! Hm, seems to be a measure of sharpness for the predicted words?
unkz t1_j2v9edv wrote
matth0x01 t1_j2vx7z4 wrote
Yes, I know the concept, but where's the connection to the pruning approach here?
unkz t1_j2wzgf3 wrote
Perplexity is one of the key evaluation metrics for how well a language model understands language. Pruning one model decreases perplexity (makes the model better), which is interesting.
matth0x01 t1_j2x49gm wrote
Thanks - I think I got it. Kind of new to me why language models use perplexity instead of log-likelihood which is a monotonic function of perplexity.
From Wikipedia it seems that perplexity is in unit "words" instead of "nats/bits", which might be more interpretable.
Are there other advantages I overlook?
unkz t1_j2x7ggd wrote
That’s basically it, cross entropy (sum of negative log likelihood) and perplexity are related by
Perplexity = 2^entropy
So the main two things are, interpretability (perplexity is a measure of how many words the model is choosing from at any point), and scale (small changes in cross entropy result in large changes in perplexity).
Viewing a single comment thread. View all comments