oathbreakerkeeper
oathbreakerkeeper t1_jd43931 wrote
Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
I'm using amp mixed precision which should be using fp16. It still requires training==false.
But the torch code also disables flash attention if autocast is enabled I'm not sure how to deal with that one.
oathbreakerkeeper t1_jd0lu2p wrote
Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
Am I looking in the wrong place? It seems like the torch 2.0 code still requires training==False in order to use FlashAttention:
oathbreakerkeeper t1_jcee0rj wrote
Reply to comment by MysteryInc152 in [D] What do people think about OpenAI not releasing its research but benefiting from others’ research? Should google meta enforce its patents against them? by [deleted]
Where/when did they hint that?
oathbreakerkeeper t1_jc5viv0 wrote
Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Who is emad? And who is SD?
oathbreakerkeeper t1_jc5vbgx wrote
Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
How was openais api used to bootstrap alpaca?
oathbreakerkeeper t1_j15vgip wrote
Reply to comment by blose1 in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501
What's COT?
oathbreakerkeeper t1_jdgjte0 wrote
Reply to comment by Dependent_Ad5120 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
How do you use pure fp16 out of curiosity? I've only ever trained with mixed precision, letting pytorch handle the fp16 stuff from there.
Do you have an example of a github repo that does it?