[R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) Submitted by bo_peng t3_11teywc on March 17, 2023 at 2:49 AM in MachineLearning 32 comments 101
[deleted] t1_jclws3b wrote on March 17, 2023 at 7:54 PM Reply to comment by yehiaserag in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng [deleted] Permalink Parent −3− yehiaserag t1_jcm31zk wrote on March 17, 2023 at 8:36 PM We say RWKV for short, the rest of the stuff is for a specific version Permalink Parent 6 [deleted] t1_jcs3icv wrote on March 19, 2023 at 3:09 AM [removed] Permalink Parent 1 [deleted] t1_jcs3lrc wrote on March 19, 2023 at 3:10 AM [removed] Permalink Parent 1
yehiaserag t1_jcm31zk wrote on March 17, 2023 at 8:36 PM We say RWKV for short, the rest of the stuff is for a specific version Permalink Parent 6 [deleted] t1_jcs3icv wrote on March 19, 2023 at 3:09 AM [removed] Permalink Parent 1 [deleted] t1_jcs3lrc wrote on March 19, 2023 at 3:10 AM [removed] Permalink Parent 1
Viewing a single comment thread. View all comments