Beautiful-Gur-9456 OP t1_je3hxbn wrote on March 29, 2023 at 3:27 AM

The training pipeline, honestly, is significantly simpler without adversarial training, so the design space is much smaller.

It's actually reminiscent of GANs since it uses pre-trained networks as a loss function to improve the quality, though it's completely optional. Still, it's a lot easier than trying to solve any kind of minimax problem.

geekfolk t1_je3io3b wrote on March 29, 2023 at 3:33 AM

using pretrained models is kind of cheating, some GANs use this trick too (projected GANs). But as a standalone model, it does not seem to work as well as SOTA GANs (judged by the numbers in the paper)

>Still, it's a lot easier than trying to solve any kind of minimax problem.

This is true for GANs in the early days; however, modern GANs are proved to not have mode collapse and the training is proved to converge.

>It's actually reminiscent of GANs since it uses pre-trained networks

I assume you mean distilling a diffusion model in the paper. There have been some attempts to combine diffusion and GANs to get the best of both worlds but afaik none involved distillation, I'm curious if anyone has tried distilling diffusion models into GANs.

Beautiful-Gur-9456 OP t1_je3qsdu wrote on March 29, 2023 at 4:52 AM

Nope. I mean the LPIPS loss, which kinda acts like a discriminator in GANs. We can replace it to MSE without much degradation.

Distilling SOTA diffusion model is obviously cheating 😂, so I didn't even think of it. In my view, they are just apples and oranges. We can augment diffusion models with GANs and vice versa to get the most out of them, but what's the point? That would make things way more complex. It's clear that diffusion models cannot beat SOTA GANs for one-step generation; they've been tailored for that particular task for years. But we're just exploring possibilities, right?

Aside from the complexity, I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator. Using pre-trained VGG is cheating anyway. That would be an interesting direction to see!

geekfolk t1_je59x39 wrote on March 29, 2023 at 2:49 PM

>I think it's worth a shot to replace LPIPS loss and adversarially train it as a discriminator

that would be very similar to this: https://openreview.net/forum?id=HZf7UbpWHuA

Beautiful-Gur-9456 OP t1_je5p8bu wrote on March 29, 2023 at 4:28 PM

was that a thing? lmao 🤣

[deleted] t1_je3ah6y wrote on March 29, 2023 at 2:26 AM

[deleted]

Username912773 t1_je3pipz wrote on March 29, 2023 at 4:39 AM

Aren’t GANs substantially larger and harder to preserve image structure?

geekfolk t1_je3qyfr wrote on March 29, 2023 at 4:54 AM

I don’t know about this model, but GANs are typically smaller than diffusion models in terms of num of params. The image structure thing probably has something to do with the network architecture since GANs rarely use attention blocks and the network architecture of diffusion models is more hybrid (typically CNN + attention)

Beautiful-Gur-9456 OP t1_je3sung wrote on March 29, 2023 at 5:15 AM

I think the reason lies in the difference in the amount of computation rather than architectural difference. Diffusion models have many chances to correct their predictions, but GANs do not.

huehue9812 t1_je3q0pl wrote on March 29, 2023 at 4:44 AM

Hey, can I ask something about 0-GP GANs? This is the first time I've ever heard of them. I was wondering what makes them superior over R1 regularization. Also, why is it that most papers mention R1 reg., but not 0-GP?

geekfolk t1_je3qiqw wrote on March 29, 2023 at 4:49 AM

R1 is one form of 0-gp, it’s actually introduced in the paper that proposed 0-gp. See my link above

[P] Consistency: Diffusion in a Single Forward Pass 🚀

geekfolk t1_je23p7c wrote on March 28, 2023 at 9:14 PM