master3243 t1_iw1x35r wrote on November 12, 2022 at 8:44 AM

Reply to comment by VinnyVeritas in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru

> They theoretically show that, different from naive identity mapping, their initialization methods can avoid training degeneracy when the network dimension increases. In addition, they empirically show that they can achieve better performance than random initializations on image classification tasks, such as CIFAR-10 and ImageNet. They also show some nice properties of the model trained by their initialization methods, such as low-rank and sparse solutions.

VinnyVeritas t1_iw9ajwe wrote on November 13, 2022 at 10:47 PM

The performance is not better: the results are the same within the margin of error for standard (not super-deep networks). Here I copied from their table:

Cifar10

ZerO Init 5.13 ± 0.08
Kaiming Init 5.15 ± 0.13

Imagenet

ZerO Init 23.43 ± 0.04
Kaiming Init 23.46 ± 0.07