Submitted by ryusan8989 t3_yzgwz5 in singularity
CosmicVo t1_ix2q8f1 wrote
Reply to comment by michael_mullet in 2023 predictions by ryusan8989
Scale is indeed not all we need. In fact GPT-4 has less parameters than GPT-3. Or the same. Idk. Anyway the focus is shifting toward trainingdata (e.g. learning rate, batch size, sequence length, etc). They’re trying to find optimal models instead of just bigger ones. Hyperparameter tuning is unfeasible for larger models but result in a performance increase equivalent to doubling the number of parameters.
Viewing a single comment thread. View all comments