Bot-69912020 t1_j24hkd7 wrote on December 29, 2022 at 4:30 PM

It might be more transparent to split up your approach in two steps. First, we try to get a valid probability vector for each prediction (i.e. the vector sums up to 1). Second, we try to recalibrate the probabilities in each vector to improve the correctness of the predicted probabilities.

For the first point, it is important to know the range of your invalid outputs: If they are negative as well as positive, you might want to transform your whole output via softmax function. If you only have positive values v1, ..., vm, but the sum of the vector is not, then it is sufficient to compute vi / (v1+...+vm) to get valid probabilities.

Now, we can try to improve the predicted probabilities via post-hoc recalibration. For this, there have been several methods proposed. But, the simplest baseline, which works surprisingly well for most cases is temperature scaling. Start with that and try to make it work - it usually always gives at least minor improvements in ECE and NLL (don't use ECE alone, it is unreliable; see Fig.2). Once TS works, you can still try out ensemble temperature scaling, parametrized temperature scaling, intra-order preserving scaling, splines, ...

Some of these methods (including temperature scaling) use logits as inputs and their output are logits again. So, to receive logits, you apply the multivariate logit function if you already have probabilities, or simply use your untransformed outputs as logits if you would have used softmax in the first step.