Submitted by starstruckmon t3_1027geh in MachineLearning
unkz t1_j2x7ggd wrote
Reply to comment by matth0x01 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
That’s basically it, cross entropy (sum of negative log likelihood) and perplexity are related by
Perplexity = 2^entropy
So the main two things are, interpretability (perplexity is a measure of how many words the model is choosing from at any point), and scale (small changes in cross entropy result in large changes in perplexity).
Viewing a single comment thread. View all comments