Viewing a single comment thread. View all comments

ThePerson654321 OP t1_jbjz508 wrote

> I emailed them that RWKV exactly met their desire for a way to train RNNs 'on the whole internet' in a reasonable time. So prior to a month ago they didn't know it existed or happened to meet their use case.

That surprises me considering his RWKV repo/repos has thousands of stars on GitHub.

I'm curious about what they responded with. What did they say?

> There was no evidence it was going to be interesting. There are lots of ideas that work on small models that don't work on larger models.

According to his claim (especially infinite ctx len) it definitely was interesting. That it was scaling was pretty obvious even at 7B.


But your argument is basically that no large organization simply has noticed it yet.

My guess is that it actually has some unknown problem/limitation that makes it inferior to the transformer architecture.

We'll just have to wait. Hopefully you are right but I doubt it.

1