xEdwin23x t1_jcjfnlj wrote
Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
First, this is not a "small" model so size DOES matter. It may not be hundreds billion parameters but it's definitely not small imo.
Second, it always has been (about data) astronaut pointing gun meme.
FallUpJV t1_jclpydo wrote
Yes it's definitely not small, I meant comparated to the models people have been paying attention to the most on the last few years I guess.
The astronaut pointing gun meme is a good analogy, almost a scary one, I wonder how much we could improve existing models with simply better data.
Viewing a single comment thread. View all comments