Viewing a single comment thread. View all comments

grid_world OP t1_ischbjf wrote

I don’t think that the Gaussians are being output by a layer. In contrast with an Autoencoder, where a sample is encoded to a single point, in a VAE, due to the Gaussian prior, a sample is now encoded as a Gaussian distribution. This is the regularisation effect which enforces this distribution in the latent space. It cuts both ways, meaning that if the true manifold is not Gaussian, we still assume and therefore force it to be Gaussian.

A Gaussian signal being meaningful is something that I wouldn’t count on. Diffusion models are a stark contrast, but we aren’t talking about them. The farther a signal is away from a standard Gaussian, the more information it’s trying to smuggle through the bottleneck.

I didn’t get your point of looking at the decoder weights to figure out whether they are contributing? Do you compare them to their randomly initiated values to infer this?

3

The_Sodomeister t1_isci8nh wrote

This is the part I'm referencing:

> The unneeded dimensions don't learn anything meaningful and therefore remain a standard, multivariate, Gaussian distribution. This serves as a signal that such dimensions can safely be removed without significantly impacting the model's performance.

How do you explicitly measure and use this "signal"? I don't think you'd get far by just measuring "farness away from Gaussian", as you'd almost certainly end up throwing away certain useful dimensions that may simply appear "Gaussian enough".

> I didn’t get your point of looking at the decoder weights to figure out whether they are contributing? Do you compare them to their randomly initiated values to infer this?

If the model reaches this "optimal state" where certain dimensions aren't contributing to the decoder output, then you should be able to detect this with some form of sensitivity analysis - i.e. changing the values in those dimensions shouldn't affect the decoder output, if those dimensions aren't being used.

This assumes that the model would correctly learn to ignore unnecessary latent dimensions, but I'm not confident it would actually accomplish that.

3

grid_world OP t1_isduwtm wrote

Retraining the model with reduced dimensions would be a *rough* way of _proving_ this. But the stochastic behavior of neural networks makes this hard to achieve.

1