Submitted by killver t3_y2vvne in MachineLearning
lostmsu t1_is5jeeb wrote
The transformer is awfully small (2 blocks, 200 embedding, 35 seq length). I would discard that result as useless. They should be testing on GPT2-117M or something similar.
DanShawn t1_is5w6an wrote
Or even default BERT...
Viewing a single comment thread. View all comments