Viewing a single comment thread. View all comments

lostmsu t1_is5jeeb wrote

The transformer is awfully small (2 blocks, 200 embedding, 35 seq length). I would discard that result as useless. They should be testing on GPT2-117M or something similar.

28