nullspace1729 t1_irv5z1j wrote on October 11, 2022 at 7:57 AM Reply to [D] Classification with final layer having no activation? by AbIgnorantesBurros It’s because of something called the log-sum trick. If you combine the activation with the loss you can increase numerical stability when the logits are very close to 0 or 1. Permalink 7
[D] Recent ML papers to implement from scratch Submitted by nullspace1729 t3_y0dk5c on October 10, 2022 at 12:29 PM in MachineLearning 10 comments 31
nullspace1729 t1_irv5z1j wrote
Reply to [D] Classification with final layer having no activation? by AbIgnorantesBurros
It’s because of something called the log-sum trick. If you combine the activation with the loss you can increase numerical stability when the logits are very close to 0 or 1.