Complex_Candidate_28 t1_j67cx4a wrote on January 28, 2023 at 6:05 AM

Reply to comment by cthorrez in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

Yes, the size also affects finetuning but much less sensitive.

Complex_Candidate_28 t1_j67aytx wrote on January 28, 2023 at 5:43 AM

Reply to comment by cthorrez in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

Because for small-size LMs, ICL is unstable, i.e., it sometimes degrades to classifying all examples into one category. The protocol tries to ensure analyzing ICL when it works well. (For much larger-size LMs, the performance variance would be much smaller, where this step can be ignored.)

Complex_Candidate_28 t1_j675z5i wrote on January 28, 2023 at 4:52 AM

Reply to comment by cthorrez in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

The purpose of the experiments is not to compare the performance between them. The goal is to compare the mechanisms behind them. So it doesn't affect the conclusion itself. The point is to use the same set of examples for analysis.