[deleted] t1_j15c4qw wrote on December 21, 2022 at 7:54 PM

[deleted]

PengsoonThePenguin t1_j16rrtg wrote on December 22, 2022 at 2:03 AM

I guess an easy explanation is that the model works solely from retrieval over the corpus. Every prediction has to be explained by the corpus.

drd13 t1_j1h3gvy wrote on December 24, 2022 at 8:00 AM

Similarly to T5 (abd Bert) the model is pre-trained by predicting some randomly masked spans of words. However the way these spans of words are predicted is different.

In T5, masked words are generated one-by-one autoregressively (i.e. use a softmax over vocabulary to generate words one by one). Here a set of candidate possible spans, covering your whole trained corpus is preliminarily created and the model looks at all the candidate spans and chooses the one it thinks is the best (using a contrastive loss).