Submitted by super_deap t3_11tmpc5 in MachineLearning
Unlucky_Excitement_2 t1_jczk2lm wrote
Reply to comment by lucidraisin in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?
lucidraisin t1_jcznnvh wrote
actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.
Unlucky_Excitement_2 t1_jczo8wf wrote
Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.
lucidraisin t1_jczoelv wrote
yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email
Viewing a single comment thread. View all comments