Shot_Expression8647 t1_irw2ihl wrote on October 11, 2022 at 2:02 PM

Reply to [D] Classification with final layer having no activation? by AbIgnorantesBurros

Consider a neural network with no hidden layers, ie, logistic regression. You can think about it as learning a linear function wx + b with the logistic loss. If we replace the logistic loss with the hinge loss, we get the SVM. There are other losses you can use as well (eg exponential loss), but I’m not sure if they have names. Thus, we can see that leaving the output untransformed abstracts a part of the model that we can easily switch out to get other models.

In practice, you might leave the last layer untransformed if you want to use the layer as an embedding on another task.