Viewing a single comment thread. View all comments

farmingvillein t1_jbkwkgl wrote

> most extraordinary claim I got stuck up on was "infinite" ctx_len.

All RNNs have that capability, on paper. But the question is how well does the model actually remember and utilize things that happened a long time ago (things that happened beyond the the window that a transformer has, e.g.). In simpler RNN models, the answer is usually "not very".

Which doesn't mean that there can't be real upside here--just that it is not a clear slam-dunk, and that it has not been well-studied/ablated. And obviously there has been a lot of work in extending transformer windows, too.

5