Viewing a single comment thread. View all comments

astrange t1_jdy6d4f wrote

But nobody uses the base model, and when they did use it, it was only interesting because it fails to predict the next word and therefore generates new text. A model that successfully predicts the next word all the time given existing text would be overfitting, since it would only produce things you already have.

1

Rioghasarig t1_jdz24za wrote

People were using the base model when it first came out and some people are still using it today. The game AI Dungeon is still runs on what is essentially a transformer trained on next token prediction. So it would be accurate to say "It's just (attempts to) outputs the next most probable word" .

1