BrotherAmazing t1_iso8emo wrote on October 17, 2022 at 1:40 PM

It’s what other people have already said: This is extremely common to see if you’re using dropout. There’s nothing necessarily wrong here either, and this network might outperform a network (on test data—the data we care about!) that is trained without dropout and gets higher training accuracy.

Here is how you can prove it to yourself:

You can keep dropout activated during test time as an experiment and see that the test accuracy, when dropout remains on, does indeed decrease to be below the training accuracy.
You can keep everything else fixed and just parametrically dial down the dropout % in each dropout layer. Usually 0.5 (50%) is a default, but you’ll see for a fixed training/test split that as that parameter goes from 0.5 —> 0.25 —> 0.1 —> 0.05 —> 0 that the training accuracy will increase back to be at/above the test accuracy.

You can also rule out the possibility you had a rare split that led to an easy test set and hard training set by splitting randomly over and over and seeing that this phenomenology is not rare, but the norm across nearly all splits, but if 1 and 2 above exhibit behavior consistent with dropout being the reason, then I see this last exercise as a waste of time unless you just want to win an argument against someone who insists it is due to a “bad” split. If they really insist that vs. just propose it as a possible reason, then they don’t have much real-world experience using dropout! This is very common, nothing wrong, telltale sign of dropout.

No_Slide_1942 t1_iso8k41 wrote on October 17, 2022 at 1:41 PM

The same thing is happening without dropout. What should I do?

BrotherAmazing t1_iso8zyl wrote on October 17, 2022 at 1:45 PM

If you have a different problem where this happens without dropout then you may indeed want to make sure the training/test split isn’t a “bad” one and do k-fold validation.

The other thing to check would be other regularizers you may be using during training but not test that make it harder for the network to do well on training sets; i.e., you can dial down data augmentation if you are using that, and so on.

Things people have touched upon already for the most part, but this is very common to see when using dropout layers.

redditnit21 OP t1_isoa0lc wrote on October 17, 2022 at 1:52 PM

I commented from the different account by mistake. After looking at everyone’s comment. I tried without dropout and the same thing is happening. I am not using any data augmentation except the rescaling (1/.255).

BrotherAmazing t1_isosnle wrote on October 17, 2022 at 4:11 PM

Then indeed I would try different randomized training/test set splits to rule that out as one step in the debugging.