Viewing a single comment thread. View all comments

farmingvillein t1_jbk47jg wrote

> I emailed them that RWKV exactly met their desire for a way to train RNNs 'on the whole internet' in a reasonable time. > > So prior to a month ago they didn't know it existed or happened to meet their use case.

How does #2 follow from #1?

RWKV has been on reddit for quite a while, and a high number of researchers frequent/lurk on reddit, including Deepmind researchers, so the idea that they had no idea that RWKV exists seems specious.

Unless you mean that you emailed them and they literally told you that they didn't know about this. In which case...good on you!

1

ThePerson654321 OP t1_jbk6nb4 wrote

Thanks! I also find it very unlikely that nobody from a large organisation (Openai, Microsoft, Google Brain, Deepmind, Meta, etc) would have noticed it.

2

farmingvillein t1_jbk819k wrote

I think it is more likely people have seen it, but dismissed it as a bit quixotic, because the RWKV project has made little effort to iterate in an "academic" fashion (i.e., with rigorous, clear testing, benchmarks, goals, comparisons, etc.). It has obviously done pieces of this, but hasn't been sufficiently well-defined as to make it easy for others to iterate on top of it, from a research POV.

This means that anyone else picking up the architecture is going to have to go through the effort to create the whole necessary research baseline. Presumably this will happen, at some point (heck, maybe someone is doing it right now), but it creates a large impediment to further iteration/innovation.

11

LetterRip t1_jbkdshr wrote

Here is what the author stated in the thread,

> Tape-RNNs are really good (both in raw performance and in compression i.e. very low amount of parameters) but they just can't absorb the whole internet in a reasonable amount of training time... We need to find a solution to this!

I think they knew it existed (ie they knew there was a deeplearning project named RWKV), but they appear to have not know it met their scaling needs.

2

farmingvillein t1_jbkx0co wrote

I don't understand the relevance here--tape-RNNs != RWKV, unless I misunderstand the RWKV architecture (certainly possible).

3