unkz
unkz t1_j2x7ggd wrote
Reply to comment by matth0x01 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
That’s basically it, cross entropy (sum of negative log likelihood) and perplexity are related by
Perplexity = 2^entropy
So the main two things are, interpretability (perplexity is a measure of how many words the model is choosing from at any point), and scale (small changes in cross entropy result in large changes in perplexity).
unkz t1_j2wzgf3 wrote
Reply to comment by matth0x01 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Perplexity is one of the key evaluation metrics for how well a language model understands language. Pruning one model decreases perplexity (makes the model better), which is interesting.
unkz t1_j2v9edv wrote
unkz t1_j2ujn5z wrote
Reply to comment by SoulCantBeCut in [R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder. by olegranmo
Please don’t, I think we have all heard enough from him.
unkz t1_iudsx4n wrote
Reply to Believe it or not, Ember is more than half husky and malamute (the rest is blue tick coon hound and pit bull) by overcomebyfumes
I guess this is being posted because of the pit bull mauling video that’s going around eh.
unkz t1_je9wuzm wrote
Reply to comment by saintshing in [R] LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention by floppy_llama
Practically speaking, it does have a context limit — that RNN issue has not really been solved. It is a lot of fun to play with though.