Submitted by super_deap t3_11tmpc5 in MachineLearning
lucidraisin t1_jczarq8 wrote
Reply to comment by antonb90 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
that isn't for decoders. encoder only, and still needs to be verified. the majority of research paper never work out on closer examination. just trust me, stick with flash attention for now until further notice and save yourself a lot of headache
Unlucky_Excitement_2 t1_jczk2lm wrote
Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?
lucidraisin t1_jcznnvh wrote
actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.
Unlucky_Excitement_2 t1_jczo8wf wrote
Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.
lucidraisin t1_jczoelv wrote
yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email
[deleted] t1_jczdwt5 wrote
[deleted]
Viewing a single comment thread. View all comments