turnip_burrito t1_jcoul9i wrote
Reply to comment by MysteryInc152 in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Yes, exactly. Everyone keeps leaving the architecture's inductive structural priors out of the discussion.
It's not all about data! The model matters too!
Viewing a single comment thread. View all comments