Submitted by International_Deer27 t3_10qhscf in deeplearning

Hi all! I'm currently implementing a CNN in PyTorch and having a hard time with it. It's for a binary classification problem and my loss function keeps fluctuating without a pattern. I've tried many things that I saw online: CrossEntropyLoss function, BCELoss with Sigmoid, BCEWithLogitsLoss, reducing network layers to only a couple, gradient accumulation instead of normal optimization…. My dataset is about 5500 samples with X input of matrix form size 2000x5 and Y 0 or 1. How should I proceed?

3

Comments

You must log in or register to comment.

like_a_tensor t1_j6q5cs1 wrote

Are you implementing the CNN from scratch? If so, the problem might be in your implementation.

Play with the batch size and batch norm. Try different optimizers. Your learning rate might also be too large; experiment with smaller learning rates or something like torch's ReduceLROnPlateau.

5500 sample is also pretty small, so maybe try a shallower network.

2

International_Deer27 OP t1_j6rxo78 wrote

Yes I am, I also uploaded the code below in case you can have a look. I'll look into ReduceLROnPlateau

1

sulpha1 t1_j6qyygr wrote

You can also post the code for help, I would also say look to PyTorch's forums for help if you haven't already.

1

International_Deer27 OP t1_j6rxjtd wrote

import torch

import torch.nn as nn

from torch.utils.data import Dataset, DataLoader

from sklearn.model_selection import train_test_split

import numpy as np

df_Y_MACE = np.array(df_Y_MACE)

df_X_MACE = np.array(df_X_MACE)

X = torch.from_numpy(df_X_MACE).float()

Y = torch.from_numpy(df_Y_MACE).float()

# Define the dataset

class ECGDataset(Dataset):

def __init__(self, data, labels):

self.data = data

self.labels = labels

def __len__(self):

return len(self.data)

def __getitem__(self, idx):

return self.data[idx], self.labels[idx]

# Split the data into training and testing sets

train_data, test_data, train_labels, test_labels = train_test_split(X, Y, test_size=0.2)

# Create the dataset and data loader

train_dataset = ECGDataset(train_data, train_labels)

test_dataset = ECGDataset(test_data, test_labels)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the CNN

class ECGClassifier(nn.Module):

def __init__(self):

super(ECGClassifier, self).__init__()

self.fc = nn.Linear(128*5, 1)

self.act = nn.ReLU()

self.sigmoid = nn.Sigmoid()

self.dropout = nn.Dropout(0.5)

self.layers = [[],[],[],[],[]]

for i in range(5):

self.layers[i].append(nn.Conv1d(1, 32, kernel_size=20, stride=5))

self.layers[i].append(nn.BatchNorm1d(32))

self.layers[i].append(nn.MaxPool1d(7,2))

self.layers[i].append(nn.Conv1d(32, 64, kernel_size=16, stride=5))

self.layers[i].append(nn.BatchNorm1d(64))

self.layers[i].append(nn.MaxPool1d(7,3))

self.layers[i].append(nn.Conv1d(64, 128, kernel_size=2, stride=3))

self.layers[i].append(nn.BatchNorm1d(128))

self.layers[i].append(nn.Linear(4, 1))

self.layers[i].append(nn.BatchNorm1d(128))

self.layers[i].append(nn.Dropout(0.5))

def forward(self, x):

x_cols = [[], [], [], [], []]

for i in range(5):

x_cols[i] = x[:,:,i].unsqueeze(1)

x_cols[i] = self.layers[i][0](x_cols[i])

x_cols[i] = self.layers[i][1](x_cols[i])

x_cols[i] = self.act(x_cols[i])

x_cols[i] = self.layers[i][2](x_cols[i])

x_cols[i] = self.layers[i][3](x_cols[i])

x_cols[i] = self.layers[i][4](x_cols[i])

x_cols[i] = self.act(x_cols[i])

x_cols[i] = self.layers[i][5](x_cols[i])

x_cols[i] = self.layers[i][6](x_cols[i])

x_cols[i] = self.layers[i][7](x_cols[i])

x_cols[i] = self.act(x_cols[i])

x_cols[i] = self.layers[i][8](x_cols[i])

x_cols[i] = self.layers[i][9](x_cols[i])

x_cols[i] = self.layers[i][10](x_cols[i])

x = torch.cat((*x_cols, ), 1)

x = x.view(-1, 128*5)

x = self.fc(x)

x = self.sigmoid(x)

return x

# Define the model and move it to the device

device = torch.device('cpu')

model = ECGClassifier()

model = model.to(device)

model = model.float()

# Define the loss function and optimizer

criterion = nn.BCELoss()

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.001)

# Train the model

for epoch in range(5):

for i, (data, labels) in enumerate(train_loader):

data, labels = data.to(device), labels.to(device)

# Forward pass

with torch.set_grad_enabled(True):

outputs = model(data)

labels = labels.unsqueeze(1)

loss = criterion(outputs, labels)

# Backward and optimize

optimizer.zero_grad()

loss.backward()

optimizer.step()

print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, 5, loss.item()))

1

BlacksmithNo4415 t1_j6tfpkg wrote

try using markdowns:

​

        plotter = DLPlotter()     # add this line
        model = MyModel()
        ...
        total_loss = 0
        for epoch in range(5):
            for step, (x, y) in enumerate(loader):
                ...
                output = model(x)
                loss = loss_func(output, y)
                total_loss += loss.item()
                ...
        config = dict(lr=0.001, batch_size=64, ...)
        plotter.collect_parameter("exp001"", config, total_loss / (5 * len(loader))     # add this line
        plotter.construct()     # add this line
1

International_Deer27 OP t1_j6usb0i wrote

I’m not sure about the DLPlotter, which library did you get it from, I can’t seem to find it? I’m using Python 3

1

BlacksmithNo4415 t1_j6usty4 wrote

no, that was an example code to show you how much better the code is readable when you use markdowns..

DLPlotter is a library i am building in the moment.. :)

1

International_Deer27 OP t1_j6uufti wrote

Ah alright, thanks, I’ll try and see how else I can modify the code and get it working. Good luck with the library!

1

International_Deer27 OP t1_j6x0tpy wrote

I've simplified my model a lot to only take into account 2000x1 tensors as input for X and the prediction is either 0 or 1 as before. I've made it using nn.Sequential with only a few layers to be easier to follow:

import torch

import torch.nn as nn

from torch.utils.data import Dataset, DataLoader

import torch.nn.functional as F

from sklearn.model_selection import train_test_split

import numpy as np

import matplotlib as plt

df_Y_MACE = np.array(df_Y_MACE)

df_X_MACE1 = []

for i in range(len(df_X_MACE)):

df_X_MACE1.append(df_X_MACE[i][0])

df_X_MACE1 = np.array(df_X_MACE1)

X = torch.from_numpy(df_X_MACE1).float()

Y = torch.from_numpy(df_Y_MACE).float()

# Define the dataset

class ECGDataset(Dataset):

def __init__(self, data, labels):

self.data = data

self.labels = labels

def __len__(self):

return len(self.data)

def __getitem__(self, idx):

return self.data[idx], self.labels[idx]

# Split the data into training and testing sets

train_data, test_data, train_labels, test_labels = train_test_split(X, Y, test_size=0.8)

# Create the dataset and data loader

train_dataset = ECGDataset(train_data, train_labels)

test_dataset = ECGDataset(test_data, test_labels)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the CNN

class ECGClassifier(nn.Module):

def __init__(self):

super(ECGClassifier, self).__init__()

self.ECG_seq = nn.Sequential(nn.Conv1d(1, 32, kernel_size = 50, stride = 5), nn.ReLU(), nn.MaxPool1d(7,2), nn.Linear(193,1))

self.fc = nn.Linear(32, 1)

self.sigmoid = nn.Sigmoid()

def forward(self, x):

x = x.unsqueeze(1)

out = self.ECG_seq(x)

out = self.fc(out.view(-1,32))

out = self.sigmoid(out)

return out

# Define the model and move it to the device

device = torch.device('cpu')

model = ECGClassifier()

model = model.to(device)

model = model.float()

# Define the loss function and optimizer

criterion = nn.BCELoss()

optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.01)

total_loss = []

# Train the model

for epoch in range(5):

for i, (data, labels) in enumerate(train_loader):

data, labels = data.to(device), labels.to(device)

# Forward pass

with torch.set_grad_enabled(True):

outputs = model(data)

labels = labels.unsqueeze(1)

loss = criterion(outputs, labels)

total_loss.append(loss)

# Backward and optimize

optimizer.zero_grad()

loss.backward()

optimizer.step()

print ('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, 5, loss.item()))

1

International_Deer27 OP t1_j6x0yff wrote

For this new model the loss function looks pretty much the same:

Epoch [1/5], Loss: 0.8073

Epoch [2/5], Loss: 0.8680

Epoch [3/5], Loss: 0.5826

Epoch [4/5], Loss: 0.7626

Epoch [5/5], Loss: 0.6099

1

BlacksmithNo4415 t1_j6x2xia wrote

i've checked for papers that do exactly what you want.

so as I assumed this data is time sensitive and therefor you need an additional temporal dimension.

this model needs to be more complex in order to solve this problem.

i suggest reading this:

https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-021-01736-y

​

BTW: have you tried grid search for finding the right hyperparametrs?

oh and your model does improve..

have you increased the data set size??

1

Etodmitry22 t1_j6rpice wrote

The loss will always fluctuate especially for complex networks/tasks, the thing you should care about is loss decreasing overall and metrics giving better results on the test set. No fluctuation in loss and perfect convergence is a very rare thing that is mostly seen in ML tutorials and not real-world cases.

If you do not see any improvement overall try to overfit on a small subset of training data - if your model cannot overfit to small data it means bugs in your model or data.

1

BlacksmithNo4415 t1_j6tfgmy wrote

too me it also sounds like a bad learning rate. have you checked the distribution of your weights for each layer in each step?

P.S: try hyperparameter optimization methods like grid search or baysian. in that way you get faster an answer to your question..

1

BlacksmithNo4415 t1_j6uwn1n wrote

i can try to help you though, i worked as a deep learning engineer in computer vision:

  1. do you mean the dimension of 1 sample is [2000, 5] ? that is a very weird shape for an image. usually they have a shape of [h, w, 3] and [h, w, 4] for video data - a temporal additional dimension is added
  2. what do you want this model should be classifying ? so far it sounds more trivial - but depending on the object it might be a bit more complex.
  3. the more complex your task -> more complex your model must be -> a larger data set you will need
  4. how are the labels distributed in your data set ?
  5. do you use adversarial attacks for robustness ? don't do that at the beginning.
  6. are you sure that a cnn is the proper model for signal classification ?
  7. how do you want to represent your dataset ? what should be the 3rd axes represent as an information ?
  8. btw dropouts makes it also more difficult for the model to overfit. you use this so the model learns to generalize
  9. i think the model is way to complex when the task is actually trivial. but i never did any signal classification
  10. the use of sigmoid can lead to exploding gradients
−1