uhules t1_j6wrx63 wrote on February 2, 2023 at 1:16 PM

Reply to comment by Mefaso in [D] Why is stable diffusion much smaller than predecessors? by dahdarknite

Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.

Mefaso t1_j6z6zgt wrote on February 2, 2023 at 10:43 PM

>DALL-E 2 also applies diffusion in latent space

Not really in the important part. Dalle2 uses diffusion in clip-"latent"-space and then conditions the pixel-diffusion model on the result.

However they still do a full diffusion pass in pixel-space, which is more complex than doing it in latent space, as LDMs do.