Submitted by AutoModerator t3_100mjlp in MachineLearning
v2thegreat t1_j2oablu wrote
Reply to comment by oilfee in [D] Simple Questions Thread by AutoModerator
For transformers that's likely a difficult question to answer without experimentation, but I always recommend to start small. It's generally hard enough to go from 0 to 1 without also worrying about scaling things up.
Currently, we're seeing that larger and larger models aren't really slowing down and continue to become more powerful.
I'd say that this deserves it's own post rather than a simple question.
Good luck and please respond when you end up solving it!
Viewing a single comment thread. View all comments