Submitted by AutoModerator t3_z07o4c in MachineLearning
danman966 t1_ixgzwfh wrote
Reply to comment by BBAAQQDDD in [D] Simple Questions Thread by AutoModerator
Back propagation is essentially applying the chain rule a bunch of times. Since Neural nets and other functions are just applying basic functions loads of times on top of a variable x, to get some output z, e.g. z = f(g(h(x))), then the derivative of z with respect to the parameters of f, g, and h, is going to be the chain rule applied three times. Since pytorch/tensorflow store all derivatives of their functions, e.g. activation functions or linear layers in a neural network, it is easy for the software to compute each gradient.
We need the gradient of course because that is how we update our parameter values, with gradient descent or something similar.
Viewing a single comment thread. View all comments