Viewing a single comment thread. View all comments

uhules t1_j6wrx63 wrote

Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.

4

Mefaso t1_j6z6zgt wrote

>DALL-E 2 also applies diffusion in latent space

Not really in the important part. Dalle2 uses diffusion in clip-"latent"-space and then conditions the pixel-diffusion model on the result.

However they still do a full diffusion pass in pixel-space, which is more complex than doing it in latent space, as LDMs do.

1