FallUpJV t1_jclpydo wrote
Reply to comment by xEdwin23x in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Yes it's definitely not small, I meant comparated to the models people have been paying attention to the most on the last few years I guess.
The astronaut pointing gun meme is a good analogy, almost a scary one, I wonder how much we could improve existing models with simply better data.
Viewing a single comment thread. View all comments