Submitted by AutoModerator t3_110j0cp in MachineLearning
trnka t1_j91xnym wrote
Reply to comment by TrainquilOasis1423 in [D] Simple Questions Thread by AutoModerator
In terms of probabilities yeah that's right.
In the actual code, it's most common to do a softmax over the output vocabulary. In practice that means the model computes the probability of every possible next output (whether word or subword) and then we sort it, take the argmax, or the top K depending on the problem.
I think about generating one word at a time as a key part of the way we're searching through the space of probable sentences, because we can't afford to brute-force search.
Viewing a single comment thread. View all comments