Comments

You must log in or register to comment.

behold_s t1_ivotsdx wrote

Today I just watched the video about Stable Diffusion on 2minutes paper YT channel. I thought İ I understood the core of the subject, until I see this post 😂. What is one awesome thing you think this advancements will possibly lead up to?

−3

dojoteef t1_ivou9rr wrote

No one can answer that question, since not all possible output images are equally probable (some are even impossible given trained network weights). You might be able to make an empirical estimate, but enumerating the true output space of any sufficiently complex NN is an open problem.

8

dasayan05 t1_ivpmx7r wrote

There is no way to feasibly compute what you are asking for.

Diffusion Models (in fact any modern generative model) are defined on continuous image-space, i.e. a continuous vector of 512x512 length. This space is not discrete, so there isn't even any notion of "distinct images". A tiny continuous change can lead to another plausible image and there are (theoretically) infinitely many tiny change you can apply on an image to produce another image that looks same but isn't the same point in image space.

The (theoretically) correct answer to your question would be that there are infitiely many images you can sample from a given generative model.

17

Nmanga90 t1_ivpsuae wrote

Well if we are talking just about the output of any diffusion model, being 512x512 pixels we get 262,144 pixels.

Each pixel can have a range of 0-255 for R,G, and B

256^3 = 16,777,217 (possible combinations for each pixel)

So then we get 16,777,217 ^ 262,144

This is an unimaginably large number, but it’s important to note that many of these images will appear to be exactly the same as one another due to our perception of color.

3

bloc97 t1_ivpzu4j wrote

Theoretically, the upper bound of distinct images is proportional to the number of bits required to encode each latent, thus a 64x64x4 latent encoded as a 32-bit number would amount to (2^32)^(64x64x4) images. However, many of those combinations are not considered to be "images" (they are "out of distribution"), thus the real number might be much much smaller than this, depending on the dataset and the network size.

11

bloc97 t1_ivqgf0q wrote

I was considering an unconditional latent diffusion model, but for conditional models, the computation becomes much more complex (we might have to use bayes here). If we use Score-Based Generative Modeling (https://arxiv.org/abs/2011.13456), we could try to find and count all the unique local minima and saddle points, but it is not clear how we can do this...

3

tripple13 t1_ivqpwft wrote

This.

I guess that's whats remarkably fascinating by these models.

Albeit, in essence, you are putting a prior on the training set, thus there should be some limit to the manifold from which samples are generated.

6