Submitted by _atswi_ t3_11aq4qo in MachineLearning
What's the best way to quantify the uncertainty of a trained LLM? I assume the entropy of the model's final probability distribution is a decent measure. Just wanted to know if the NLP community sticks to this measure, or if there's something more specific to language?
Would really appreciate recent references that may have popped up over the past few months (if any). Also if there are any cool & easy to integrate implementations. Thanks!
pyepyepie t1_j9uanug wrote
In all honesty, at some point, any type of evaluation that is not qualitative is simply a joke. I have observed it a long time ago while working on NMT and trying to base the results on BLEU score - it literally meant nothing. Trying to force new metrics based on simple rules or computation will probably fail - I believe we need humans or stronger LLMs in the loop. E.g., humans should rank the output of multiple LLMs and the same humans should do so for multiple different language models, not just for the new one. Otherwise, I view it as a meaningless self-promoting paper (LLMs are not interesting enough to read about if there are no new ideas and no better performance). Entropy is good for language models that are like "me language model me no understand world difficult hard", not GPT-3 like.
Edit: this semantic uncertainty looks interesting but I would still rather let humans rank the results.