Viewing a single comment thread. View all comments

MysteryInc152 t1_jclpjzi wrote on March 17, 2023 at 7:07 PM

Reply to comment by FallUpJV in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

It's predicting language. as long as the structure can allow properly to learn to predict language, you're good to go.

turnip_burrito t1_jcoul9i wrote on March 18, 2023 at 12:41 PM

Yes, exactly. Everyone keeps leaving the architecture's inductive structural priors out of the discussion.

It's not all about data! The model matters too!