Delacroid
Delacroid t1_j9itr09 wrote
Reply to comment by vladosaurus in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
Well that good be an amazing post to read. How many times does it get math questions right but with an statistically significant number of samples. So that we can actually compare to the state of the art, such as galactica.
Delacroid t1_j9grwzu wrote
Reply to comment by vladosaurus in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
I'll admit that my comment may come off as elitist, but I think that you have to admit that this was a very low effort post. Maybe a more correct sub for this post would have been r/learnmachinelearning.
Delacroid t1_j9flphj wrote
Did you really need to numerically compute the gradient to check it was OK? Dude this is high-school math.
Delacroid t1_ivslx97 wrote
Reply to comment by [deleted] in [Discussion] Could someone explain the math behind the number of distinct images that can be generated with a latent diffusion model? by [deleted]
Yes but you can make the latent dimension as high as you want.
Delacroid OP t1_iseztbz wrote
Reply to comment by tdgros in [D] Interpolation in medical imaging? by Delacroid
Thanks, I never thought about the analogy with an impainting problem.
Edit: grammar
Delacroid OP t1_iseuqrb wrote
Reply to comment by eigenham in [D] Interpolation in medical imaging? by Delacroid
Thank you very much because I didn't know that super resolution also dealt with interpolating. I thought it was only for improving quality of image as going from 360p to 720p. I will try to use the term in my search and see what I get.
Submitted by Delacroid t3_y4kvq2 in MachineLearning
Delacroid t1_jb4c3xt wrote
Reply to comment by jobeta in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho
I don't think so. If you look at the figure and check the angle between whole dataset backprop and minibatch backprop, increasing the learning rate wouldn't change that angle. Only the scale of the vectors.
Also, dropout does not (only) introduce noise, it prevents coadaptation of neurons. In the same way that in random forest each forest is trained on a subset on the data (bootstrapping I think it's called) the same happens for neurons when you use dropout.
I haven't read the paper but my intuition says thattthe merit of dropout for early stages of training could be that the bootstrapping is reducing the bias of the model. That's why the direction of optimization is closer to the whole dataset training.