[R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python Submitted by bo_peng t3_11iwt1b on March 5, 2023 at 1:11 PM in MachineLearning 26 comments 63
Spare_Side_5907 t1_jb0rhw1 wrote on March 5, 2023 at 3:41 PM Is this similar to Toeplitz Neural Network for Sequence Modeling https://openreview.net/forum?id=IxmWsm4xrua ? Permalink 2 bo_peng OP t1_jb1qws0 wrote on March 5, 2023 at 7:43 PM TNN is like convolution, while RWKV can be written as a CNN too (RWKV v1 is a CNN). So there's some similarity, though not much :) Permalink Parent 2 estrafire t1_jc2umln wrote on March 13, 2023 at 5:15 PM Any particular reason for moving from CNN to RNN? Permalink Parent 1
bo_peng OP t1_jb1qws0 wrote on March 5, 2023 at 7:43 PM TNN is like convolution, while RWKV can be written as a CNN too (RWKV v1 is a CNN). So there's some similarity, though not much :) Permalink Parent 2 estrafire t1_jc2umln wrote on March 13, 2023 at 5:15 PM Any particular reason for moving from CNN to RNN? Permalink Parent 1
estrafire t1_jc2umln wrote on March 13, 2023 at 5:15 PM Any particular reason for moving from CNN to RNN? Permalink Parent 1
Viewing a single comment thread. View all comments