Delacroid

Delacroid t1_jb4c3xt wrote

I don't think so. If you look at the figure and check the angle between whole dataset backprop and minibatch backprop, increasing the learning rate wouldn't change that angle. Only the scale of the vectors.

Also, dropout does not (only) introduce noise, it prevents coadaptation of neurons. In the same way that in random forest each forest is trained on a subset on the data (bootstrapping I think it's called) the same happens for neurons when you use dropout.

I haven't read the paper but my intuition says thattthe merit of dropout for early stages of training could be that the bootstrapping is reducing the bias of the model. That's why the direction of optimization is closer to the whole dataset training.

3

Delacroid OP t1_iseuqrb wrote

Thank you very much because I didn't know that super resolution also dealt with interpolating. I thought it was only for improving quality of image as going from 360p to 720p. I will try to use the term in my search and see what I get.

2