CKtalon t1_j4enpew wrote on January 15, 2023 at 4:14 AM

GPT2 was trained on a different dataset, with little code (other than those obtained from the CommonCrawl). GPT Neo uses The Pile which contains a lot of code.

GasZealousideal8691 OP t1_j4eo0ov wrote on January 15, 2023 at 4:17 AM

Oh sorry if that wasn’t clear, but the stuff I’m training on isn’t code, it’s natural language.

WigglyHypersurface t1_j4gr979 wrote on January 15, 2023 at 4:53 PM

The amount of code in the training data might effect specific task performance, even if the task itself involves no code. Seems to maybe be particularly the case for tasks requiring attention to long range dependencies and abstract reasoning.

GasZealousideal8691 OP t1_j4gs8gc wrote on January 15, 2023 at 4:59 PM

But would it affect it to this extent? To be clear, this is not just "bad performance", or "horrendous performance". Our project is loosely investigating the performance of different editing methods on LMs given some datasets we made, and none of the editing methods, from fine-tuning to gradient-methods, change the performance at all.

Furthermore, GPT2 outputs an equal accuracy and specificity values (specificity is basically the degree to which it "remembers" other unrelated facts; the goal here is to minimize catastrophic forgetting), which makes absolutely 0 sense, because they aren't even measured on the same scale. Accuracy is usually >0, <1 and specificity is usually ~26 based on our measures.

It doesn't have anything to do with the way accuracy/specificity are computed, because the code for GPT-Neo is identical minus the model= and tokenizer= statements, and it works fine for GPT-Neo. So there is something fundamentally crazy going on with GPT2...

WigglyHypersurface t1_j4gzjvd wrote on January 15, 2023 at 5:45 PM

If you're messing with the weights that deeply and directly I'm not sure. But it smells like a bug to me.

GasZealousideal8691 OP t1_j4hk6kz wrote on January 15, 2023 at 7:52 PM

Im fairly certain it’s something with the model. Like even fine tuning is giving these weird errors, when it had no problems for GPT-Neo.

We also ran this stuff on T5, obviously had to configure the rest of the code differently but it was doing fine for that as well.