Submitted by AKavun t3_105na47 in deeplearning
So I am trying to make a CNN image classifier that has two classes, good and bad. The aim is to look at photoshoot pictures that can be found on fashion sites and find the "best one". I trained this model for 150 epochs and its loss did not change at all(Roughly). Details are as follows:
My metric for "best one", also the way I structured my dataset, is that the photo shows the whole outfit or whole body of the model not only like upper body or lower. I also labeled the photos where the model's back was turned to the camera. My training set has 1304 good photos and 2000 bad photos. My validation set has 300 photos per class.(so 600)
Architechture as follows : Conv > Pool > Conv > Pool > Flatten > Linear > Linear > Softmax. For details of architecture like stride etc. check out the code I provided.
I have to use softmax since in my application I need to see the probabilities of being good and bad. That is why I am not using cross-entropy loss but instead using negative log-likelihood loss. Adam is my optimizer
Other hyperparameters: batch size: 64, number of epochs: 150, input size: (224, 224), number of classes: 2, learning rate: 0.01, weight decay: 0.01
I trained with this script for 150 epochs. The model initialized with a 0.5253 loss and ended with a 0.5324 loss. I took snapshots every 10 epochs but I did not learn anything through the learning. This is what my learning curve looks like:
Now I know that there are many many things I can do to make the model perform better like initializing with a pretrained, doing some more stuff with transforms, etc. But the problem with this model is not that it is performing poorly, it is not performing at all! Like, I also have a validation accuracy check and it is around %50 during all training, for a classifier with 2 classes. So I am assuming I am doing something very obvious wrong, any idea what?
suflaj t1_j3bt2eq wrote
That learning rate is about 100 times higher than you give to Adam for that batch size. That weight decay is also about 100 times higher, and if you want to use weight decay with Adam, you should probably use the AdamW optimizer (which is more or less the same thing, just fixes the interaction between Adam and weight decay)
Also, loss is not something that determines how much a model has learned. You should check out validation F1, or whatever metrics are relevant for the performance of your model.