debrises OP t1_j5ynn5c wrote on January 26, 2023 at 2:05 PM

Larger batch sizes lead to a better gradient estimation, meaning, optimizer steps tend to be in the “right” direction, thus leading to faster convergence.

Run a test epoch to see when your model converges, and then use slightly more epochs so that your model can try to find different minimum points. And use model checkpoint callback.

As for loss, just use an Optimizer from the Adam family, like AdamW. It handles most of the problems that can happen pretty well.

The learning rate heavily depends on what range of values your loss has. Think about it this way: if your loss is equal to 10 then using the lr of 0.01 will get us 10 * 0.01 = 0.1. We then compute partial derivatives of this value with respect to each weight and backpropagate that and update our weights. Usually, we want our weights to have small values and to be centered around zero, updating them by even smaller values every step. The point is that your model doesn't know what values your loss takes and thus, you have to optimize the learning rate to find that nice value that connects your loss signal to your weights.

debrises t1_j3irf9s wrote on January 8, 2023 at 9:28 PM

Reply to comment by CygnusX1 in [D] Simple Questions Thread by AutoModerator

>What are techniques or best practices for detecting/segmenting large objects in high resolution images? Some problems I run into are training with large image chip sizes

The first thing that came to my mind was gradient accumulation if you have limited GPU memory. Fitting an image of that size on a single GPU could result in a very small batch, which is not so good for training speed and stability.

PyTorch lightning offers such a feature if you're using PyTorch.