Submitted by besabestin t3_10lp3g4 in MachineLearning
vivehelpme t1_j5y70zt wrote
>what is very special about the model than the large data and parameter set it has
OpenAI have a good marketing department and the web interface is user friendly. But yeah there's really no secret sauce to it.
The model generates the text snippet in a batch, it just prints it a character at a time for dramatic effect(and to keep you occupied for a while so you don't overload the horribly computationally expensive cloud service it runs on with multiple queries in quick succession), so yeah definitely scaling questions before it could be ran as a google replacement general casual search engine.
besabestin OP t1_j5ya2af wrote
I see. Interesting. I thought it was generating one by one like that. I wonder why it sometimes encounters error after generating a long text and just stops half way through the task - which happened to me frequently.
crt09 t1_j5ytazq wrote
the guy above was kind of unclear, its an autoregressive langauge model so it does generate one at a time, puts it back into the input and generates the next one. It could be printed out in one go once they waitied for it to stop and then be sent to the client and pritned all at once but they went with the fancy GUI type, possibly yeah as a way to slow down spamming
visarga t1_j6c0e8m wrote
They might use a second model to flag abuse, not once every token, but once every line or phrase. Their models are already trained to avoid being abused, but this second model is like insurance in case the main one doesn't work.
suntehnik t1_j5ykwei wrote
Just speculation here: maybe they store generated text in a buffer and when they run out of memory buffer can be flushed to get allocation back for other tasks.
londons_explorer t1_j60m5ui wrote
This isn't true.
The model generates 1 token at a time, and if you look at the network connection you can see it slowly loading the response.
I'm pretty sure the speed the answer is returned is as fast as openAI can generate it on their cluster of GPU's.
visarga t1_j6c01ua wrote
> But yeah there's really no secret sauce to it.
Of course there is - it's data. They keep their mix of primary training sets with organic text, multi-task fine-tuning, code training and RLHF secret. We know only in general lines what they are doing, but details matter. How much code did they train on? it matters. How many tasks? 1800 like FLAN T5 or much more, like 10,000? We have no idea. Do they reuse the prompts to generate more training data? Possibly. Others don't have their API logs because they had no demo.
Viewing a single comment thread. View all comments