Viewing a single comment thread. View all comments

londons_explorer t1_j60m5ui wrote on January 26, 2023 at 9:35 PM

This isn't true.

The model generates 1 token at a time, and if you look at the network connection you can see it slowly loading the response.

I'm pretty sure the speed the answer is returned is as fast as openAI can generate it on their cluster of GPU's.