Viewing a single comment thread. View all comments

farmingvillein t1_jdj9w98 wrote

> these models are very sparse

Hmm, do you have any sources for this assertion?

It isn't entirely unreasonable, but 1) GPU speed-ups for sparsity aren't that high (unless OpenAI is doing something crazy secret/special...possible?), so this isn't actually that big of an upswing (unless we're including MoE?) and 2) openai hasn't released architecture details (beyond the original gpt3 paper--which did not indicate that the model was "very" sparse).

1