Im currently working with the transformer architecture and doing depth estimation. My dataset is 6700 images of dimensions 3x256x256. I've run into a wierd thing. My validation loss suddenly falls alot around epoch 30-40 while my training loss barely does. I cant seem to find out why it is happening. Hope you can help me! I use Adam with lr=0.000001

The code for the vision transformer is here.

https://stackoverflow.com/questions/75582628/why-does-my-validation-loss-suddenly-fall-dramatically-while-my-training-loss-do

Comments

yannbouteiller t1_ja8cd7n wrote on February 27, 2023 at 4:30 PM

#2,061,029

That is pretty strange indeed. Perhaps this would be a magical effect of dropout ?

trajo123 t1_ja8cyw2 wrote on February 27, 2023 at 4:34 PM

#2,061,172

How is your loss defined? How is your validation set created? Does it happen if for any test/validation split?

alam-ai t1_ja8czf5 wrote on February 27, 2023 at 4:34 PM

#2,061,177

Maybe it doesn't do the dropout regularization during validation but does only for training? And without the dropout the model does better, sort of like how you see better with both eyes open together than individually with either eye.

Also probably can just switch your training and validation sets and rerun your test to see that the actual data in the splits isn't somehow the issue.

[deleted] t1_ja8d3hh wrote on February 27, 2023 at 4:35 PM

#2,061,205

Replying to trajo123 (#2,061,172)

[deleted]

Oceanboi t1_ja8egtc wrote on February 27, 2023 at 4:43 PM

#2,061,560

It could be too much dropout. But also how large is your test data in relation to your train data and are you leaking any information from one into the other?

Apprehensive_Air8919 OP t1_ja8mxcj wrote on February 27, 2023 at 5:37 PM

#2,063,833

Replying to Oceanboi (#2,061,560)

The size of the testing data is 20%, and dropout is 10%. I am not sure how I could be leaking information

Apprehensive_Air8919 OP t1_ja94rat wrote on February 27, 2023 at 7:29 PM

#2,068,676

Replying to alam-ai (#2,061,177)

good analogy! Yes I use model.eval() so dropout is removed when doing the forward pass on the validation set.

Apprehensive_Air8919 OP t1_ja96vdu wrote on February 27, 2023 at 7:43 PM

#2,069,211

Replying to trajo123 (#2,061,172)

nn.MSELoss(), I used sklearn train_test_split() with test_size being = 0.2. It is consistent behavior across any split i've seen. The wierd thing is that it only happens when I run very low lr

trajo123 t1_ja9aghn wrote on February 27, 2023 at 8:05 PM

#2,070,200

Replying to Apprehensive_Air8919 (#2,069,211)

Very strange.

Are you sure your dataset is shuffled before the split? Have you tried different random seeds, different split ratios?

Or maybe there a bug in how you calculate the loss, but that should affect the training set as well...

So my best guess is you either don't have your data shuffled and the validation samples are "easier" or maybe it's something more trivial, like a bug in the plotting code. Or maybe that's the point where your model become self-aware :)