Submitted by soraki_soladead t3_zmoxp7 in MachineLearning
2600_yay t1_j0fc7j4 wrote
"Are Neighbors Enough"'s authors swap out self-attention in a Transformer for a multi-head neural n-gram model? Perhaps that's what you're looking for?
soraki_soladead OP t1_j0gemzb wrote
It isn’t but interesting paper!
Viewing a single comment thread. View all comments