Submitted by rubbledubbletrubble t3_zmxbb5 in deeplearning
I was recently messing around with architectures.
I tried this: Conv layer (2000 dim output) Dense layer (500 dim) Dense layer (1000 dim)
Doing this completely broke my results. The model no longer trained on the dataset.
Can this be explained or do I have a bug in my code?
I assume this is because by adding a smaller middle layer, we reduce the amount of information.
Edit: Tried this for middle layer output ranging from 100 to 950 and all the models give the same results. About 0.5% accuracy
Edit 2: the model trains well if you remove any activation from the middle layer. Not sure why
suflaj t1_j0dojq2 wrote
You introduced a bottleneck. Either you needed to train it longer, or your bottleneck destroyed part of the information needed for better performance.