Viewing a single comment thread. View all comments

BrotherAmazing t1_j0fai9p wrote

In this case, I don’t think anyone can tell you wtf is going on without a copy of your code and dataset. There are just so many unknowns, but is this 1000 dim dense layer the last layer before a softmax?

Are you training the other layers then adding this new layer with new weight initialization in between the trained layers, or are you adding it in as a new architecture and re-initializing the weights everywhere and starting from scratch again?

5

rubbledubbletrubble OP t1_j0iib4p wrote

The 1000 layer is the softmax layer. I am using a pretrained model and training the classification layers. My logic is to reduce the number of output layers the feature extractor to reduce the number of total parameters.

For example: If mobilenet outputs 1280 and I had a 1000 unit dense layer. The parameters would be 1.28 million. But if I added a 500 unit layer in the middle, it would make the network smaller.

I know the question is bit vague. I was just curious

1