Viewing a single comment thread. View all comments

CKtalon t1_j5y87e5 wrote

People often quote Chinchilla about performance, claiming that there's still a lot of performance to be unlocked when we do not know how GPT 3.5 was trained. GPT 3.5 could very well be Chinchilla-optimal, even though the 1st version of davinci was not Chinchilla-optimal. We know that OpenAI has retrained GPT 3 due to the increased context length going from 2048 to 4096 to the apparent 8000ish tokens for ChatGPT.

9

manubfr t1_j5y8mo0 wrote

You're right, it could be that 3.5 is already using that approach. I guess the emergent cognition tests haven't yet been published for GPT-3.5 (or have they?) so it's hard for us to measure performance as individuals. I guess someone could test text-davinci-003 on a bunch of cognitive tasks on the PlayGround but I'm far too lazy to do that :)

2

CKtalon t1_j5y9deu wrote

There's also the rumor mill that Whisper was used to gather a bigger text corpus from videos to train GPT 4.

3