patient_zer00
patient_zer00 t1_iujl1if wrote
Reply to [D] When the GPU is NOT the bottleneck...? by alexnasla
Disc IO is often a bootleneck.
Also, even though using a GPU will increase training speed with LSTMs, too, the computation of the gradient relies on the whole sequence to be processed each sequence step after the other, which can't be parallelized. That's probably why your speed increase is not that big using a K80 vs a A100.
Edit: typos
patient_zer00 t1_izuqszr wrote
Reply to [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
It doesn't remember stuff, its mostly the web app that remembers it, it sometimes resends the previous request with your current one. (Check the chrome request logs) It will then probably concatenate the prompts and feed them as one to the model.