bo_peng OP t1_jcjuejz wrote
Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
ChatRNN is indeed a great name :)
R W K V are the four major parameters in RWKV (similar to QKV for attention).
I guess you can pronounce it like "Rwakuv" (A bit like racoon)
Viewing a single comment thread. View all comments