Submitted by redditnit21 t3_y5qn9h in deeplearning
BrotherAmazing t1_iso8emo wrote
It’s what other people have already said: This is extremely common to see if you’re using dropout. There’s nothing necessarily wrong here either, and this network might outperform a network (on test data—the data we care about!) that is trained without dropout and gets higher training accuracy.
Here is how you can prove it to yourself:
-
You can keep dropout activated during test time as an experiment and see that the test accuracy, when dropout remains on, does indeed decrease to be below the training accuracy.
-
You can keep everything else fixed and just parametrically dial down the dropout % in each dropout layer. Usually 0.5 (50%) is a default, but you’ll see for a fixed training/test split that as that parameter goes from 0.5 —> 0.25 —> 0.1 —> 0.05 —> 0 that the training accuracy will increase back to be at/above the test accuracy.
You can also rule out the possibility you had a rare split that led to an easy test set and hard training set by splitting randomly over and over and seeing that this phenomenology is not rare, but the norm across nearly all splits, but if 1 and 2 above exhibit behavior consistent with dropout being the reason, then I see this last exercise as a waste of time unless you just want to win an argument against someone who insists it is due to a “bad” split. If they really insist that vs. just propose it as a possible reason, then they don’t have much real-world experience using dropout! This is very common, nothing wrong, telltale sign of dropout.
No_Slide_1942 t1_iso8k41 wrote
The same thing is happening without dropout. What should I do?
BrotherAmazing t1_iso8zyl wrote
If you have a different problem where this happens without dropout then you may indeed want to make sure the training/test split isn’t a “bad” one and do k-fold validation.
The other thing to check would be other regularizers you may be using during training but not test that make it harder for the network to do well on training sets; i.e., you can dial down data augmentation if you are using that, and so on.
Things people have touched upon already for the most part, but this is very common to see when using dropout layers.
redditnit21 OP t1_isoa0lc wrote
I commented from the different account by mistake. After looking at everyone’s comment. I tried without dropout and the same thing is happening. I am not using any data augmentation except the rescaling (1/.255).
BrotherAmazing t1_isosnle wrote
Then indeed I would try different randomized training/test set splits to rule that out as one step in the debugging.
Viewing a single comment thread. View all comments