Submitted by super_deap t3_11tmpc5 in MachineLearning
mike94025 t1_je5ojaw wrote
Reply to comment by tripple13 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap
It is. Follow the call tree into F.multi_head_attention_forward
Viewing a single comment thread. View all comments