mongoosefist

mongoosefist t1_j7tt6a2 wrote

Emergent behaviour is called such because we don't yet have the ability to predict it, we can only observe it and deduce where it emerged after the fact. SO, the fact that you can't wrap your head around what such an ability would look like makes perfect sense!

If we're speculating I'd put my money on /u/ID4gotten 's answer. I bet one of these models starts integrating some intuition of physical laws.

13

mongoosefist t1_j71dbhq wrote

Differential privacy methods work in a way that's quite similar to the denoising process of diffusion models already. The problem is that in most Differential privacy methods they rely on the discreteness of data. The latent space of diffusion models is completely continuous, so there is no way to tell the difference between similar images, and thus you can't tell which ones are from the training data if any at all.

For example, if you're pretty sure the diffusion model has memorized an oil painting of Kermit the frog, there is no way for you to say with any reasonable amount of certainty whether images you are denoising that turn out to be oil paintings of Kermit are from actual pictures, or from the distribution of oil paintings overlapping with the distribution of pictures of Kermit from the latent space, because there is no hard point where one transitions to the other, or a meaningful difference in density between the distribution

2

mongoosefist t1_j6wed0f wrote

When the latent representation is trained, it should learn an accurate representation of the training set, but obviously with some noise because of the regularization that happens by learning the features along with some guassian noise in the latent space.

So by theoretically, I meant that due to the way the VAE is trained, on paper you could prove that you should be able to get an arbitrarily close representation of any training image if you can direct the denoising process in a very specific way. Which is exactly what these people did.

I will say there should be some hand waving involved however, because again even though it should be possible, if you have enough images that are similar enough in the latent space that there is significant overlap between their distributions, it's going to be intractably difficult to recover these 'memorized' images.

2

mongoosefist t1_iwzv3e7 wrote

Your statements are only correct if you're being extremely specific to this use case.

Real-time AR 'clothing try on' tools have been around for a handful of years already. I've never seen an implementation where they support any garment, but I'd argue that due to the fact that this is not meaningfully different than SNAP's normal AR filters, this implementation has a long way to go to catch up to the state of the art.

25

mongoosefist t1_iwz4hkk wrote

I'm not really impressed by this tbh. Besides the obvious small issues with temporal stability, it looks like it's rather naively fitting it to the geometry of the person.

It seems to have just applied the pattern of the sweater to the subjects arms, so it's not like they're 'wearing' it. It even just applies the exact same folds from the sweater to the person's arms because it can't tell what's a sweater pattern, and what's due to the geometry of the sweater itself.

168