Viewing a single comment thread. View all comments

lucidraisin t1_jczarq8 wrote

that isn't for decoders. encoder only, and still needs to be verified. the majority of research paper never work out on closer examination. just trust me, stick with flash attention for now until further notice and save yourself a lot of headache

2

Unlucky_Excitement_2 t1_jczk2lm wrote

Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?

2

lucidraisin t1_jcznnvh wrote

actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.

2

Unlucky_Excitement_2 t1_jczo8wf wrote

Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.

2

lucidraisin t1_jczoelv wrote

yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email

1