hfnuser0000 t1_jcnspad wrote on March 18, 2023 at 4:33 AM

Reply to comment by Art10001 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Hi there! It sounds really interesting! Could you please share the name of the project or provide a link to it? I would love to check it out. Thank you!

hfnuser0000 t1_jcl5akd wrote on March 17, 2023 at 4:57 PM

Reply to comment by Spiritual-Reply5896 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

How many ways are there to implement memory retrieval? Can someone explain the intuition behind it? Thank you in advance!

hfnuser0000 t1_j8qoshn wrote on February 16, 2023 at 6:50 AM

Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

I am interested in the theoretical aspect of how your model work. Says transformers, you have tokens that attend to other tokens. In the case of RNNs, a piece of information can be preserved for later uses but with a cost of reducing memory capacity for other information and once the information is lost, it's lost forever. So I think the context length of a RNN scale linearly with the memory capacity (and indirectly with the number of parameters), right?