Viewing a single comment thread. View all comments

antonb90 t1_jczajd1 wrote

Things are improving fast.

>COLT5 is better at any speed. For 16k input length, COLT5 matches or exceeds LONGT5 quality for Large and XL with 35-75% training speedup and 50-100% inference speedup on top of the order-of-magnitude inference speedup from MQA. Encoder speedups are even greater (Appendix D). COLT5-XL also achieves SOTA performance on the SCROLLS benchmark

​

>COLT5 achieves both stronger performance and faster inference speed at all input lengths and is able to effectively make use of extremely long inputs. We note that COLT5 achieves large quality gains by going from 32k to 64k tokens even while keeping the number of routed tokens constant, providing more evidence for our hypothesis.

Google's new COLT5 64k,

https://arxiv.org/abs/2303.09752

1

lucidraisin t1_jczarq8 wrote

that isn't for decoders. encoder only, and still needs to be verified. the majority of research paper never work out on closer examination. just trust me, stick with flash attention for now until further notice and save yourself a lot of headache

2

Unlucky_Excitement_2 t1_jczk2lm wrote

Since you're the OG with this. Can I pick your brain? You don't see value in hyena hierachrcy. Inference with 64k context window but 100x more efficient than flash attention. I notice on github, you plan on implementing flash attention on all your transformer based models? HH perplexity actually scales with parameter count scaling. Thoughts?

2

lucidraisin t1_jcznnvh wrote

actually, i'm keeping an eye on Hyena! there are however a number of issues i still have with the paper (i'm not going to play reviewer 2, as it is not my place nor is reddit a good forum for that), but i intend to reserve judgement and try it out on few difficult problems like genomics and EEG later this year. proof is in the pudding.

2

Unlucky_Excitement_2 t1_jczo8wf wrote

Those are actually super compelling problems. I'll keep an eye out. Again thank you, you contribute so much.

2

lucidraisin t1_jczoelv wrote

yea no problem, happy to chat more if you are doing research in this space. you can always reach out to me through email

1