Viewing a single comment thread. View all comments

t1_j91uvav wrote

Reply to comment by in [D] Simple Questions Thread by

Makes sense. To expand on the number of possible iterations wouldn't it be something akin to a collapsing wave function? Like trying to iterate through all possible responses would be impossible, but the list of probable responses shrinks as the context expands.

For example if I just input "knock" there are too many possible sentences to search, but if I input "knock knock". The most likely response is "who's there?" A simple example sure, but you get the point yea?

1

t1_j91xnym wrote

In terms of probabilities yeah that's right.

In the actual code, it's most common to do a softmax over the output vocabulary. In practice that means the model computes the probability of every possible next output (whether word or subword) and then we sort it, take the argmax, or the top K depending on the problem.

I think about generating one word at a time as a key part of the way we're searching through the space of probable sentences, because we can't afford to brute-force search.

1