Why bigger transformer models are better learners? Submitted by begooboi t3_119zmpd on February 23, 2023 at 2:56 PM in deeplearning 15 comments 7
Dropkickmurph512 t1_j9pnws1 wrote on February 23, 2023 at 6:03 PM NKT theory kinda looks into this but for more general case. The math be wilden though. Real answer is that no one knows the real reason. Permalink 1
Viewing a single comment thread. View all comments