[D] Classification with final layer having no activation? Submitted by AbIgnorantesBurros t3_y0y3q6 on October 11, 2022 at 3:11 AM in MachineLearning 7 comments 6
sanderbaduk t1_irv1lo8 wrote on October 11, 2022 at 6:53 AM For classification, you get the same answer taking the argmax of logits vs the argmax of probabilities. For training, combining the soft max or sigmoid with a loss function can be more numerically stable. Permalink 3
Viewing a single comment thread. View all comments