[R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) Submitted by bo_peng t3_11teywc on March 17, 2023 at 2:49 AM in MachineLearning 32 comments 101
ThePerson654321 t1_jcjbrg5 wrote on March 17, 2023 at 6:36 AM Wen paper? Permalink 23 bo_peng OP t1_jcjupnc wrote on March 17, 2023 at 11:00 AM Soon :) working on it. Meanwhile take a look at https://github.com/ridgerchu/SpikeGPT which is a SNN version of RWKV, so has some explanation in the paper. Permalink Parent 16
bo_peng OP t1_jcjupnc wrote on March 17, 2023 at 11:00 AM Soon :) working on it. Meanwhile take a look at https://github.com/ridgerchu/SpikeGPT which is a SNN version of RWKV, so has some explanation in the paper. Permalink Parent 16
Viewing a single comment thread. View all comments