mongoosefist
mongoosefist t1_j71dbhq wrote
Reply to comment by NitroXSC in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Differential privacy methods work in a way that's quite similar to the denoising process of diffusion models already. The problem is that in most Differential privacy methods they rely on the discreteness of data. The latent space of diffusion models is completely continuous, so there is no way to tell the difference between similar images, and thus you can't tell which ones are from the training data if any at all.
For example, if you're pretty sure the diffusion model has memorized an oil painting of Kermit the frog, there is no way for you to say with any reasonable amount of certainty whether images you are denoising that turn out to be oil paintings of Kermit are from actual pictures, or from the distribution of oil paintings overlapping with the distribution of pictures of Kermit from the latent space, because there is no hard point where one transitions to the other, or a meaningful difference in density between the distribution
mongoosefist t1_j6zfxe8 wrote
Reply to comment by NitroXSC in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
How would you know that you had recovered it if you didn't know the training data a priori?
mongoosefist t1_j6wed0f wrote
Reply to comment by bushrod in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
When the latent representation is trained, it should learn an accurate representation of the training set, but obviously with some noise because of the regularization that happens by learning the features along with some guassian noise in the latent space.
So by theoretically, I meant that due to the way the VAE is trained, on paper you could prove that you should be able to get an arbitrarily close representation of any training image if you can direct the denoising process in a very specific way. Which is exactly what these people did.
I will say there should be some hand waving involved however, because again even though it should be possible, if you have enough images that are similar enough in the latent space that there is significant overlap between their distributions, it's going to be intractably difficult to recover these 'memorized' images.
mongoosefist t1_j6ufv6a wrote
Is this really that surprising? Theoretically every image from clip should be in the latent space in a close-ish to original form. Obviously these guys went through a fair amount of trouble to recover these images, but it shouldn't surprise anyone that it's possible.
mongoosefist t1_iwzv3e7 wrote
Reply to comment by Ok-Western2685 in [N] new SNAPCHAT feature transfers an image of an upper body garment in realtime on a person in AR by SpatialComputing
Your statements are only correct if you're being extremely specific to this use case.
Real-time AR 'clothing try on' tools have been around for a handful of years already. I've never seen an implementation where they support any garment, but I'd argue that due to the fact that this is not meaningfully different than SNAP's normal AR filters, this implementation has a long way to go to catch up to the state of the art.
mongoosefist t1_iwz4hkk wrote
Reply to [N] new SNAPCHAT feature transfers an image of an upper body garment in realtime on a person in AR by SpatialComputing
I'm not really impressed by this tbh. Besides the obvious small issues with temporal stability, it looks like it's rather naively fitting it to the geometry of the person.
It seems to have just applied the pattern of the sweater to the subjects arms, so it's not like they're 'wearing' it. It even just applies the exact same folds from the sweater to the person's arms because it can't tell what's a sweater pattern, and what's due to the geometry of the sweater itself.
mongoosefist t1_j7tt6a2 wrote
Reply to [D] Are there emergent abilities of image models? by These-Assignment-936
Emergent behaviour is called such because we don't yet have the ability to predict it, we can only observe it and deduce where it emerged after the fact. SO, the fact that you can't wrap your head around what such an ability would look like makes perfect sense!
If we're speculating I'd put my money on /u/ID4gotten 's answer. I bet one of these models starts integrating some intuition of physical laws.