lucidraisin t1_jczarq8 wrote 2 years ago

Reply to comment by antonb90 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

that isn't for decoders. encoder only, and still needs to be verified. the majority of research paper never work out on closer examination. just trust me, stick with flash attention for now until further notice and save yourself a lot of headache

Unlucky_Excitement_2 t1_jczk2lm wrote 2 years ago

Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?

lucidraisin t1_jcznnvh wrote 2 years ago

actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.

Unlucky_Excitement_2 t1_jczo8wf wrote 2 years ago

Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.

lucidraisin t1_jczoelv wrote 2 years ago

yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email

[deleted] t1_jczdwt5 wrote 2 years ago

[deleted]