pyfreak182 t1_j9slq8a wrote on February 24, 2023 at 6:50 AM

It helps that the math behind back propagation (i.e. matrix multiplications) is easily parallelizable. The computations in the forward pass are independent of each other, and can be computed in parallel for different training examples. The same is true for the backward pass, which involves computing the gradients for each training batch independently.

And we have hardware accelerators like GPUs that are designed to perform large amounts of parallel computations efficiently.

The success of deep learning is just as much about implementation as it is theory.