I_will_delete_myself t1_jb32fo5 wrote
What’s the reason to use this over a transformers? Transformers allow transfer learning and is able to paralize easier. 啊我看到你的知乎。你在什么公司去工作?
[deleted] t1_jb32vlo wrote
[deleted]
Philpax t1_jb471z4 wrote
There's information about this in the README, but I'll admit that it's a little too technical and doesn't have a high-level description of the ideas. Looking forward to the paper!
I_will_delete_myself t1_jb532v5 wrote
Intelligence is the ability to take complex information into a simple explanation that a child can understand .
It makes me skeptical if someone doesn’t explain besides performance reasons . Most people just use the cloud because ML networks regardless of size take up a lot of battery.
Philpax t1_jb53nhe wrote
As far as I can tell, the sparse documentation is just because they've been in pure R&D mode. I've played around with it in their Discord server and can confirm it does perform well, but I've struggled to get it working locally.
bo_peng OP t1_jb9bdw3 wrote
Directly from RWKV-LM Github:
RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Viewing a single comment thread. View all comments