trajo123 t1_j3c38rx wrote on January 7, 2023 at 2:24 PM

Reply to comment by trajo123 in Why didn't my convolutional image classifier network learn anything! by AKavun

Several things I noticed in your code:

your model doesn't use any transfer function
the combination of final activation function and loss function is incorrect
for CNN you should be using BatchNorm2D layers

The code should look something like this:

    def __init__(self, input_size, num_classes):
        super(CNNClassifier, self).__init__()
        self.input_size = input_size
        self.num_classes = num_classes
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=1, padding=1) # increase the number of channels
        self.bn1 = nn.BatchNorm2d(32)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=8, out_channels=128, kernel_size=3, stride=1, padding=1) # increase the number of channels
        self.bn2 = nn.BatchNorm2d(128)
        self.fc1 = nn.Linear(128, 256)  # note the smaller numbers
        self.fc2 = nn.Linear(256, num_classes)
        self.bn1 = nn.BatchNorm2d(32),
        self.final_pool = nn.AdaptiveAvgPool2d(1)  # before flatten, you should use AdaptiveMaxPool2d, or AdaptiveAvgPool2d to get rid of the spatial dimensions, essentially treat each filter as one feature
        # self.softmax = nn.Softmax(dim=1) - not needed, see below. Also Softmax is not correct for use with NLLLoss, he correct one would be LogSoftmax(dim=1)
        self.f = nn.ReLU()
        
    def forward(self, x):     
        x = self.conv1(x)
        x = self.pool(x)
        x = self.f(x)  # apply the transfer function
        x = self.bn1(x) # apply batch norm (this can also be placed before the transfer function)

        x = self.conv2(x)   
        x = self.pool(x)
        x = self.f(x)  # apply the transfer function        
        x = self.bn2(x) # apply batch norm (this can also be placed before the transfer function)

        # since you are now using batchnorm, you could add a few more blocks like the one above, vanishing gradients are less of a concern now

        x = self.final_pool(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = self.f(x)  # apply the transfer function, here you could try tanh as well
        x = self.fc2(x)
        # x = self.softmax(x)  # no need for a function here because it is incorporated into the loss function for numerical/computational efficiency reasons
        return x

Also, the loss should be

# criterion = nn.NLLLoss()
criterion = nn.CrossEntropyLoss()  # the more natural choice of loss function for classification, actually for binary classification the more natural choice would be BCEWithLogitsLoss, but then you need to set the number of number of output units to 1.

trajo123 t1_j3c3cvf wrote on January 7, 2023 at 2:25 PM

...let me know if it works any better!

AKavun OP t1_j3l51kx wrote on January 9, 2023 at 8:43 AM

Thank you sir, I posted a general update to this thread and I will be further updating you about everything.