Submitted by besabestin t3_10lp3g4 in MachineLearning
CKtalon t1_j5y87e5 wrote
Reply to comment by manubfr in Few questions about scalability of chatGPT [D] by besabestin
People often quote Chinchilla about performance, claiming that there's still a lot of performance to be unlocked when we do not know how GPT 3.5 was trained. GPT 3.5 could very well be Chinchilla-optimal, even though the 1st version of davinci was not Chinchilla-optimal. We know that OpenAI has retrained GPT 3 due to the increased context length going from 2048 to 4096 to the apparent 8000ish tokens for ChatGPT.
manubfr t1_j5y8mo0 wrote
You're right, it could be that 3.5 is already using that approach. I guess the emergent cognition tests haven't yet been published for GPT-3.5 (or have they?) so it's hard for us to measure performance as individuals. I guess someone could test text-davinci-003 on a bunch of cognitive tasks on the PlayGround but I'm far too lazy to do that :)
CKtalon t1_j5y9deu wrote
There's also the rumor mill that Whisper was used to gather a bigger text corpus from videos to train GPT 4.
Viewing a single comment thread. View all comments