Submitted by _Arsenie_Boca_ t3_118cypl in MachineLearning
Many neural architectures use bottleneck layers somewhere in the architecture. What I mean by bottleneck is projecting activations to a lower dimension and back up. This is e.g. used in ResNet blocks.
What is your intuition on why this is beneficial? From an information theory standpoint, it creates potential information loss due to the lower dimensionality. Can we see this as a form of regularisation, that makes the model learn more meaningful representations?
Im interested in your intuitions in that matter or empirical results that might support these intuitions. Are you aware of other works that use bottlenecks and what is their underlying reasoning?
Professional_Poet489 t1_j9gh652 wrote
The theory is that bottlenecks are a compression / regularization mechanism. If you have a smaller number of parameters in the bottleneck than overall in the net, and you get high quality results from the output, then the bottleneck layer must be capturing the information required to drive the output to the correct results. The fact that these intermediate layers are often used for embeddings indicates that this is a real phenomenon.