incrediblediy t1_ir4ssxg wrote on October 5, 2022 at 10:28 AM

what is the "sequence length" in BERT ?

minimaxir t1_ir6bisc wrote on October 5, 2022 at 5:30 PM

The amount of tokens in the input. Sequence length requires quadratic scaling compute.

Pretrained BERT takes in a maximum of 512 tokens.

incrediblediy t1_ir7bljy wrote on October 5, 2022 at 9:20 PM

Yeah! I mean what is the seq_length used by OP :) also the batch size :) I have tried seq_length = 300 but with a small batch size in Colab, specially with AdamW instead of Adam