Viewing a single comment thread. View all comments

sanderbaduk t1_irv1lo8 wrote

For classification, you get the same answer taking the argmax of logits vs the argmax of probabilities. For training, combining the soft max or sigmoid with a loss function can be more numerically stable.

3