Viewing a single comment thread. View all comments

WigglyHypersurface t1_j4gzjvd wrote

If you're messing with the weights that deeply and directly I'm not sure. But it smells like a bug to me.

1

GasZealousideal8691 OP t1_j4hk6kz wrote

Im fairly certain it’s something with the model. Like even fine tuning is giving these weird errors, when it had no problems for GPT-Neo.

We also ran this stuff on T5, obviously had to configure the rest of the code differently but it was doing fine for that as well.

1