Submitted by rubbledubbletrubble t3_zmxbb5 in deeplearning
BrotherAmazing t1_j0fai9p wrote
In this case, I don’t think anyone can tell you wtf is going on without a copy of your code and dataset. There are just so many unknowns, but is this 1000 dim dense layer the last layer before a softmax?
Are you training the other layers then adding this new layer with new weight initialization in between the trained layers, or are you adding it in as a new architecture and re-initializing the weights everywhere and starting from scratch again?
rubbledubbletrubble OP t1_j0iib4p wrote
The 1000 layer is the softmax layer. I am using a pretrained model and training the classification layers. My logic is to reduce the number of output layers the feature extractor to reduce the number of total parameters.
For example: If mobilenet outputs 1280 and I had a 1000 unit dense layer. The parameters would be 1.28 million. But if I added a 500 unit layer in the middle, it would make the network smaller.
I know the question is bit vague. I was just curious
Viewing a single comment thread. View all comments