CKtalon t1_j5y87e5 wrote on January 26, 2023 at 11:44 AM

Reply to comment by manubfr in Few questions about scalability of chatGPT [D] by besabestin

People often quote Chinchilla about performance, claiming that there's still a lot of performance to be unlocked when we do not know how GPT 3.5 was trained. GPT 3.5 could very well be Chinchilla-optimal, even though the 1st version of davinci was not Chinchilla-optimal. We know that OpenAI has retrained GPT 3 due to the increased context length going from 2048 to 4096 to the apparent 8000ish tokens for ChatGPT.

manubfr t1_j5y8mo0 wrote on January 26, 2023 at 11:48 AM

You're right, it could be that 3.5 is already using that approach. I guess the emergent cognition tests haven't yet been published for GPT-3.5 (or have they?) so it's hard for us to measure performance as individuals. I guess someone could test text-davinci-003 on a bunch of cognitive tasks on the PlayGround but I'm far too lazy to do that :)

CKtalon t1_j5y9deu wrote on January 26, 2023 at 11:57 AM

There's also the rumor mill that Whisper was used to gather a bigger text corpus from videos to train GPT 4.