Submitted by ThePerson654321 t3_11lq5j4 in MachineLearning
farmingvillein t1_jbkwkgl wrote
Reply to comment by ThePerson654321 in [D] Why isn't everyone using RWKV if it's so much better than transformers? by ThePerson654321
> most extraordinary claim I got stuck up on was "infinite" ctx_len.
All RNNs have that capability, on paper. But the question is how well does the model actually remember and utilize things that happened a long time ago (things that happened beyond the the window that a transformer has, e.g.). In simpler RNN models, the answer is usually "not very".
Which doesn't mean that there can't be real upside here--just that it is not a clear slam-dunk, and that it has not been well-studied/ablated. And obviously there has been a lot of work in extending transformer windows, too.
Viewing a single comment thread. View all comments