Rishh3112
Rishh3112 OP t1_jdihvhj wrote
Reply to comment by what_if___420 in Cuda out of memory error by Rishh3112
Even reducing the batch size to 1 is generating the same error.
Rishh3112 OP t1_jdhti46 wrote
Reply to comment by MisterManuscript in Cuda out of memory error by Rishh3112
my model is down here. and my batch size is just 8.
Rishh3112 OP t1_jdhsvfv wrote
Reply to comment by MisterManuscript in Cuda out of memory error by Rishh3112
Using aws and it have a ram of 14gb
Rishh3112 OP t1_jdhks1w wrote
Reply to comment by CKtalon in Cuda out of memory error by Rishh3112
my batch size is just 8. I am running it on CPU and my laptop has 8gb of ram and its running fine there.
Rishh3112 OP t1_jdhiguj wrote
Reply to comment by trajo123 in Cuda out of memory error by Rishh3112
i just checked in my training loop I'm using loss.item()
Rishh3112 OP t1_jdhib79 wrote
Reply to comment by trajo123 in Cuda out of memory error by Rishh3112
sure ill will give it a try thanks a lot.
Rishh3112 OP t1_jdhi3xo wrote
Reply to comment by trajo123 in Cuda out of memory error by Rishh3112
i actually did. But the suggestions it gave was for general out of memory. Didnt help a lot.
Rishh3112 OP t1_jdhd785 wrote
Reply to comment by humpeldumpel in Cuda out of memory error by Rishh3112
class CNN(nn.Module):
def __init__(self, num_chars):
super(CNN, self).__init__()
# Convolution Layer
self.conv1 = nn.Conv2d(3, 128, kernel_size=(3, 6), padding=(1, 1))
self.pool1 = nn.MaxPool2d(kernel_size=(2, 2))
self.conv2 = nn.Conv2d(128, 64, kernel_size=(3, 6), padding=(1, 1))
self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))
# Dense Layer
self.fc1 = nn.Linear(768, 64)
self.dp1 = nn.Dropout(0.2)
# Recurrent Layer
self.lstm = nn.GRU(64, 32, bidirectional=True)
# Output Layer
self.output = nn.Linear(64, num_chars + 1)
def forward(self, images, targets=None):
bs, _, _, _ = images.size()
x = F.relu(self.conv1(images))
x = self.pool1(x)
x = F.relu(self.conv2(x))
x = self.pool2(x)
x = x.permute(0, 3, 1, 2)
x = x.view(bs, x.size(1), -1)
x = F.relu(self.fc1(x))
x = self.dp1(x)
x, _ = self.lstm(x)
x = self.output(x)
x = x.permute(1, 0, 2)
if targets is not None:
log_probs = F.log_softmax(x, 2)
input_lengths = torch.full(
size=(bs,), fill_value=log_probs.size(0), dtype=torch.int32
)
target_lengths = torch.full(
size=(bs,), fill_value=targets.size(1), dtype=torch.int32
)
loss = nn.CTCLoss(blank=0)(
log_probs, targets, input_lengths, target_lengths
)
return x, loss
return x, None
if __name__ == '__main__':
model = CNN(74)
img = torch.rand(config.BATCH_SIZE, 3, 50, 200)
target = torch.randint(1, 20, (config.BATCH_SIZE, 5))
x, loss = model(img, target)
print(loss)
Rishh3112 OP t1_jdhbz4o wrote
Reply to comment by Old-Chemistry-7050 in Cuda out of memory error by Rishh3112
The model isn't too big. There should not be a problem with that.
Rishh3112 t1_iyd0opv wrote
Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy
I would suggest on splitting the dataset and saving the weights everytime you train one set and train the next set using the weights of the previous weights.
Rishh3112 t1_iy3dnqd wrote
Reply to Deep Learning for Computer Vision: Workstation or some service like AWS? by Character-Ad9862
Hey, I work for a start-up and the ai models are trained in a AWS cloud system. I suggest on having a AWS server since the company will be requiring host service later on, and it's easier to hold a cloud server and manage it. The security system of AWS is pretty good and hosting APIs is a lot easier with a cloud server. Even training model is a lot quicker since building a AWS standard system will be quite expensive and not just the system cost but the power consumption of the system will also be high. When considering about the cost between a AWS system and a physical system in office the factor of power consumption negates the monthly cost of a cloud system. Also a cloud system could be used on any system from around the world but for a physical system in office, you would require to setup a VPN to access and the system needs to be on power to use it whenever you want. AWS server will charge only while you are using the server with a minimal monthly charge. In comparison a physical system will be initially expensive and the cost of electricity, vpn and systems and room for cooling will cost you more than a AWS server in the longer run. Hope you found this helpful.
Rishh3112 OP t1_jdl42av wrote
Reply to comment by j-solorzano in Cuda out of memory error by Rishh3112
Sure I will try using a garbage collector in every epoch. Thanks.