FallUpJV
FallUpJV t1_jcje89y wrote
Reply to [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
I don't get this anymore if it's not the model size nor the transformer architecture then what is it?
Models were just not trained enough / not on the right data?
FallUpJV t1_j5ya6t5 wrote
Reply to comment by manubfr in Few questions about scalability of chatGPT [D] by besabestin
This is something that I often read, that other LLMs are undertrained, but how come the OpenAI one is the only one not to be ? Datasets ? Computing power ?
FallUpJV t1_j45sb9f wrote
Reply to comment by gaymuslimsocialist in [D] Has ML become synonymous with AI? by Valachio
Would you recommend any of those or is there one in particular that introduces well such techniques?
Edit : I'm asking this because I'm not very familiar with what tree search means in the first place
FallUpJV t1_j20ti8i wrote
That's funny I searched for such a tool not so long ago on Google thinking it already existed, guess I was a few weeks early !
FallUpJV t1_jclpydo wrote
Reply to comment by xEdwin23x in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
Yes it's definitely not small, I meant comparated to the models people have been paying attention to the most on the last few years I guess.
The astronaut pointing gun meme is a good analogy, almost a scary one, I wonder how much we could improve existing models with simply better data.