brownmamba94 t1_jd6j6pt wrote on March 22, 2023 at 4:44 AM

Reply to comment by kilow4tt in [R] SPDF - Sparse Pre-training and Dense Fine-tuning for Large Language Models by CS-fan-101

Hi, this is the first author on the paper. You asked a great question and it’s something we are pursuing internally. In this study we kept things simple and switched from sparse to completely dense during finetuning. But as for future work, you’re right, we can certainly vary the amount of “redensification” as well (e,g., 25%, 50%, or possibly some schedule). This is a very interesting research direction, because the full dense capacity of the model may not be needed to recover performance on the downstream task.