SleekEagle

SleekEagle OP t1_iu569xx wrote

I don't think the paper explicitly says anything about this, but I would expect them to be similar. If anything I would imagine they would require less memory, but not more. That having been said, if you're thinking of e.g. DALL-E 2 or Stable Diffusion, those models also have other parts that PFGMs don't (like text encoding networks), so it is completely fair that they are larger!

4

SleekEagle OP t1_iu49wmf wrote

I can't imagine they will replace DMs at this point. DMs have been developed significantly in the last two years and there are a lot of notable advancements that have greatly improved their efficiency. Further, stable diffusion runs on DMs and that is already cemented as a pervasive open-source tool.

Further, it is also unclear at first how to add conditioning into this setup and how something like super resolution would work. Definitely not saying it's impossible, just that it is at a very nascent stage. It will be really cool to see how the two interact, such as incorporating DMs and PFGMs together in an Imagen-like setup!

1

SleekEagle OP t1_itznmgg wrote

Background

The past few years have seen incredible progress in the text-to-image domain. Models like DALL-E 2, Imagen, and Stable Diffusion can generate extremely high quality, high resolution images given only a sentence of what should be depicted in the image.

These models rely heavily on Diffusion Models (DMs). DMs are a relatively new type of Deep Learning method that is inspired by physics. They learn how to start with random "TV static", and then progressively "denoise" it to generate an image. While DMs are very powerful and can create amazing images, they are relatively slow.

Researchers at MIT have recently unveiled a new image generation model that also is inspired by physics. While DMs pull from thermodynamics, the new models, PFGMs, pull from electrodynamics. They treat the data points as charged particles, and generate data by moving particles along the electric field generated by the data.

Why they matter

PFGMs (Poisson Flow Generative Models) constitute an exciting area of research for many reasons, but in particular they have been shown to be 10-20x faster than DMs with comparable image quality

Topics of discussion

  1. With the barrier to generate images getting lower and lower (stable diffusion was released only a couple of months ago!), how will the ability for anyone to create high quality images and art affect the economy and what we perceive to be valuable art in the coming years? Short term, medium term, and long term
  2. DMs and PFGMs are both inspired by physics. Machine Learning and Deep Learning especially have been integrating concepts from high level math and physics over the past several years. How will the research in these well-developed domains inform the development of Deep Learning? In particular, will leveraging hundreds of years of scientific knowledge and mathematics research be at the foundation of an intelligence explosion?
  3. Even someone casually interested in AI will have noticed the incredible progress made in the last couple of years. While new models like this are great and advance our understanding, what responsibility do researchers, governments, and private entities have to ensure AI research is being done safely? Or, should we intentionally not be attempting to build any guard rails to begin with?
2

SleekEagle OP t1_itzkagl wrote

This is the first paper on this approach! I spoke to the authors and they're planning on continuing research down this avenue (personally I think dropping a PFGM as the base generator for Imagen and then keeping Diffusion Models for the super resolution chain would be very cool), so be on the lookout for more papers!

3

SleekEagle OP t1_itxpvth wrote

My pleasure! I'm not sure I understand exactly what you're asking, could you try to rephrase it? In particular, I'm not sure what you mean by preservation of the data distribution.

Maybe this will help answer: Given an exact Poisson field generated by a continuous data distribution, PFGMs provide an exact deterministic mapping to/from a uniform hemisphere. While we do not know this Poisson field exactly, we can estimate given many data points sampled from the distribution. PFGMs therefore provide a deterministic mapping between the uniform hemisphere and the distribution that corresponds to the learned empirical field, but not exactly to the data distribution itself. Given a lot of data, though, we expect this approximate distribution to be very close to the true distribution (universal to all generative models)

Thanks for reading! I did have a little trouble getting the repo working on my local machine, so I might expect some trouble if I were you. I reached out to the authors while writing this article and I believe they are planning on continuing research into PFGMs, so don't forget to keep an eye out for future developments!

10

SleekEagle OP t1_itwepjh wrote

u/Serverside There are a few things to note here.

First, The non-stochasticity allows for likelihood evaluation. Second, it allows for the authors to use ODE solvers (RK45 in the case of PFGMs) instead of SDE solvers, potentially combined with an application-specific method like Langevin MCMC for score-based models. Further, for diffusion there are many discrete time steps that need to be evaluated in series (at least for the usual discrete time diffusion models). The result is that PFGMs are faster than these stochastic methods. Lastly, the particular ODE for PFGMs has weaker norm-time correlation than other ODEs, in turn making the sampling more robust.

As for the deterministic mapping, it is actually the reason (at least in part) that e.g. interpolations work for PFGMs. The mapping is a continuous transformation, mapping "near points to near points" by definition. I think the determinism ensures that interpolated paths in the latent space will transform to a well-behaved path in the data space, whereas a stochastic element would very likely break this path. The stochasticity in VAEs is useful to learn parameters of a distribution and is required to sample from this distribution, but once the point is sampled it is (usually) deterministically mapped back to the data space iirc.

20

SleekEagle t1_is6trdz wrote

I'm really excited to see what the next few years bring. I've always felt like there's a lot of room for growth from higher level math, and it seems like that's beginning to happen.

I'll have a blog coming out soon on another physics-inspired model like DDPM, stay tuned!

4

SleekEagle t1_irx747e wrote

Is progress really a question? It seems very obvious that we have made progress in the last 5 years, and just looking at GANs seems ridiculous when Diffusion Models are sitting right there. Not trying to be a jerk, genuinely curious if anyone actually thinks that progress as a whole is not being made?

I definitely sympathize with the "incremental progress" that comes down to 0.1% better performance on some imperfect metric which occurs between big developments (GANs, transformers, diffusion models), but ignoring those papers and looking at bigger trends it seems obvious that really incredible progress has been made.

8

SleekEagle t1_ir0hr1m wrote

The outputs of neurons are passed through a nonlinearity which is essential to any (complex) learning process. If we didn't do this, then the NN would be a composition of linear functions which is itself a linear function (pretty boring).

As for why we choose to operate on inputs with an affine transformation before putting them through a nonlinearity, I see two reasons. The first is that linear transformations are well understood and succinct to use theoretically. The second is that computers (in particular GPUs) are very good with matrix multiplication, so we do a lot of "heavy lifting" with them and then just pass the result through a nonlinearity so we don't get a boring learning process.

Just my 2 cents, happy for input/feedback!

1