Hsemar t1_jalp8as wrote on March 2, 2023 at 9:12 AM Reply to comment by lucidraisin in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir but does flash attention help with auto-regressive generation? My understanding was that it prevents materializing the large kv dot product during training. At inference (one token at a time) with kv caching this shouldn't be that relevant right? Permalink Parent 0
Hsemar t1_jalp8as wrote
Reply to comment by lucidraisin in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
but does flash attention help with auto-regressive generation? My understanding was that it prevents materializing the large kv dot product during training. At inference (one token at a time) with kv caching this shouldn't be that relevant right?