Submitted by dahdarknite t3_10r5gku in MachineLearning
uhules t1_j6wrx63 wrote
Reply to comment by Mefaso in [D] Why is stable diffusion much smaller than predecessors? by dahdarknite
Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.
Mefaso t1_j6z6zgt wrote
>DALL-E 2 also applies diffusion in latent space
Not really in the important part. Dalle2 uses diffusion in clip-"latent"-space and then conditions the pixel-diffusion model on the result.
However they still do a full diffusion pass in pixel-space, which is more complex than doing it in latent space, as LDMs do.
Viewing a single comment thread. View all comments