Unlucky_Excitement_2 t1_jczk2lm wrote on March 20, 2023 at 7:21 PM

Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?

lucidraisin t1_jcznnvh wrote on March 20, 2023 at 7:44 PM

actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.

Unlucky_Excitement_2 t1_jczo8wf wrote on March 20, 2023 at 7:48 PM

Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.

lucidraisin t1_jczoelv wrote on March 20, 2023 at 7:49 PM

yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email