visarga t1_jdtypz6 wrote on March 27, 2023 at 4:15 AM

Reply to comment by Haycart in [D] GPT4 and coding problems by enryu42

Doesn't autoregressive decoding cache the states for the previous tokens when decoding a new token?

Haycart t1_jdu7hlp wrote on March 27, 2023 at 5:55 AM

Oh, you are probably correct. So it'd be O(N^2) overall for autoregressive decoding. Which still exceeds the O(n log n) that the linked post says is required for multiplication, though.