I get these confusions all the time. But then I remember we are back propagating the errors. Imagine your case happening and the model output was incorrect, the backprop will take care of fixing the key value being too big or small and fix the output.
p0p4ks t1_jcppzf4 wrote
Reply to Question on Attention by FunQuarter3511
I get these confusions all the time. But then I remember we are back propagating the errors. Imagine your case happening and the model output was incorrect, the backprop will take care of fixing the key value being too big or small and fix the output.