Viewing a single comment thread. View all comments

MysteryInc152 OP t1_jcputc0 wrote

Uses relative positional encoding. Long context in theory but because it was trained on 2048 tokens of context, performance gradually declines after that. Finetuning for more context wouldn't be impossible though.

You can run with FP-16 (13GB RAM), 8-bit(10GB) and 4-bit(6 GB) quantization.

36