hfnuser0000
hfnuser0000 t1_jcl5akd wrote
Reply to comment by Spiritual-Reply5896 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
How many ways are there to implement memory retrieval? Can someone explain the intuition behind it? Thank you in advance!
hfnuser0000 t1_j8qoshn wrote
Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
I am interested in the theoretical aspect of how your model work. Says transformers, you have tokens that attend to other tokens. In the case of RNNs, a piece of information can be preserved for later uses but with a cost of reducing memory capacity for other information and once the information is lost, it's lost forever. So I think the context length of a RNN scale linearly with the memory capacity (and indirectly with the number of parameters), right?
hfnuser0000 t1_jcnspad wrote
Reply to comment by Art10001 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Hi there! It sounds really interesting! Could you please share the name of the project or provide a link to it? I would love to check it out. Thank you!