Submitted by Zatania t3_xw5hhl in MachineLearning
incrediblediy t1_ir4ssxg wrote
what is the "sequence length" in BERT ?
minimaxir t1_ir6bisc wrote
The amount of tokens in the input. Sequence length requires quadratic scaling compute.
Pretrained BERT takes in a maximum of 512 tokens.
incrediblediy t1_ir7bljy wrote
Yeah! I mean what is the seq_length used by OP :) also the batch size :) I have tried seq_length = 300 but with a small batch size in Colab, specially with AdamW instead of Adam
Viewing a single comment thread. View all comments