I_will_delete_myself t1_jb32fo5 wrote on March 6, 2023 at 1:34 AM

What’s the reason to use this over a transformers? Transformers allow transfer learning and is able to paralize easier. 啊我看到你的知乎。你在什么公司去工作？

[deleted] t1_jb32vlo wrote on March 6, 2023 at 1:38 AM

[deleted]

Philpax t1_jb471z4 wrote on March 6, 2023 at 8:20 AM

There's information about this in the README, but I'll admit that it's a little too technical and doesn't have a high-level description of the ideas. Looking forward to the paper!

I_will_delete_myself t1_jb532v5 wrote on March 6, 2023 at 2:32 PM

Intelligence is the ability to take complex information into a simple explanation that a child can understand .

It makes me skeptical if someone doesn’t explain besides performance reasons . Most people just use the cloud because ML networks regardless of size take up a lot of battery.

Philpax t1_jb53nhe wrote on March 6, 2023 at 2:36 PM

As far as I can tell, the sparse documentation is just because they've been in pure R&D mode. I've played around with it in their Discord server and can confirm it does perform well, but I've struggled to get it working locally.

bo_peng OP t1_jb9bdw3 wrote on March 7, 2023 at 12:04 PM

Directly from RWKV-LM Github:

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.