Submitted by AutoModerator t3_zcdcoo in MachineLearning
Brudaks t1_iz4av37 wrote
Reply to comment by gkamer8 in [D] Simple Questions Thread by AutoModerator
My intuitive understanding is that transformers are far too "powerful"/assumption-free that they are quite data-hungry and need far more than "a couple books" to learn the structure.
If all you have is a couple of books, then IMHO a small RNN would bring better results than a transformer (but still bad - "all the works of Shakespeare" seems to be a reasonable minimum to get decent results) and the inflection point where transformer architecture starts to shine is at much larger quantities of training data.
If you do want to exactly that (and with overfitting), try starting from a sequence length of, say, 4 or 8 as a starting point.
gkamer8 t1_iz55n2z wrote
Thanks- since writing this, I got past that particular minimum with better initialization and a modified arch, but it still isn’t generating terribly interesting text. I upped the dataset to about 10 books. I think I’ll download a proper large dataset to see if it can do any better. Thanks!
Viewing a single comment thread. View all comments