[D] Trying to find paper about n-grams in early transformer layers Submitted by soraki_soladead t3_zmoxp7 on December 15, 2022 at 4:13 PM in MachineLearning 9 comments 28
aps692 t1_j0dathj wrote on December 15, 2022 at 8:31 PM Is this the one? SkipBERT Permalink 5 soraki_soladead OP t1_j0desco wrote on December 15, 2022 at 8:56 PM Reading through it now. It was on my reading list but it doesn’t look familiar. Permalink Parent 1
soraki_soladead OP t1_j0desco wrote on December 15, 2022 at 8:56 PM Reading through it now. It was on my reading list but it doesn’t look familiar. Permalink Parent 1
Viewing a single comment thread. View all comments