Submitted by AKavun t3_105na47 in deeplearning
So I am trying to make a CNN image classifier that has two classes, good and bad. The aim is to look at photoshoot pictures that can be found on fashion sites and find the "best one". I trained this model for 150 epochs and its loss did not change at all(Roughly). Details are as follows:
My metric for "best one", also the way I structured my dataset, is that the photo shows the whole outfit or whole body of the model not only like upper body or lower. I also labeled the photos where the model's back was turned to the camera. My training set has 1304 good photos and 2000 bad photos. My validation set has 300 photos per class.(so 600)
Architechture as follows : Conv > Pool > Conv > Pool > Flatten > Linear > Linear > Softmax. For details of architecture like stride etc. check out the code I provided.
I have to use softmax since in my application I need to see the probabilities of being good and bad. That is why I am not using cross-entropy loss but instead using negative log-likelihood loss. Adam is my optimizer
Other hyperparameters: batch size: 64, number of epochs: 150, input size: (224, 224), number of classes: 2, learning rate: 0.01, weight decay: 0.01
I trained with this script for 150 epochs. The model initialized with a 0.5253 loss and ended with a 0.5324 loss. I took snapshots every 10 epochs but I did not learn anything through the learning. This is what my learning curve looks like:
Now I know that there are many many things I can do to make the model perform better like initializing with a pretrained, doing some more stuff with transforms, etc. But the problem with this model is not that it is performing poorly, it is not performing at all! Like, I also have a validation accuracy check and it is around %50 during all training, for a classifier with 2 classes. So I am assuming I am doing something very obvious wrong, any idea what?
trajo123 t1_j3busy6 wrote
First of all, the dataset size is way too small to train a model from scratch to give meaningful results on this relatively complex task (more complex than MNIST for example, which has a training set of 60000 images). Second, your model is way too small/simple for this task even if you would have 100 times more data. I strongly suggest "Transfer Learning" - fine-tuning a pre-trained model by replacing the classification head, freezing the rest of the model in place and training on your dataset.
Something along these lines:
In the pre-trained model documentation you will see what training recippe was used and what transforms were applied to the image. Typically:
See more at <https://pytorch.org/vision/stable/models.html#table-of-all-available-classification-weights>. You can also find pre-trained models HuggingFace / VisionModels.
Hope this helps, good luck!