oathbreakerkeeper

oathbreakerkeeper t1_jdgjte0 wrote on March 24, 2023 at 6:14 AM

Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

How do you use pure fp16 out of curiosity? I've only ever trained with mixed precision, letting pytorch handle the fp16 stuff from there.

Do you have an example of a github repo that does it?

oathbreakerkeeper t1_jd43931 wrote on March 21, 2023 at 6:14 PM

Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

I'm using amp mixed precision which should be using fp16. It still requires training==false.

But the torch code also disables flash attention if autocast is enabled I'm not sure how to deal with that one.

oathbreakerkeeper t1_jd0lu2p wrote on March 20, 2023 at 11:34 PM

Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Am I looking in the wrong place? It seems like the torch 2.0 code still requires training==False in order to use FlashAttention:

https://github.com/pytorch/pytorch/blob/663e7c9eeb66fb049b8487a6a5a7ea4311fb53d3/torch/nn/modules/activation.py#L1139

oathbreakerkeeper t1_jcee0rj wrote on March 16, 2023 at 6:01 AM

Reply to comment by MysteryInc152 in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]

Where/when did they hint that?

oathbreakerkeeper t1_jc5viv0 wrote on March 14, 2023 at 7:19 AM

Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Who is emad? And who is SD?

oathbreakerkeeper t1_jc5vbgx wrote on March 14, 2023 at 7:16 AM

Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

How was openais api used to bootstrap alpaca?

oathbreakerkeeper t1_j15vgip wrote on December 21, 2022 at 10:03 PM

Reply to comment by blose1 in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501

What's COT?