Viewing a single comment thread. View all comments

orbital_lemon t1_j7idllq wrote

It saw stock photo watermarks millions of times during training. Nothing else in the training data comes even close. Even at half a bit per training image, that can add up to memorization of a shape.

Apart from the handful of known cases involving images that are duplicated many times in the training data, actual image content can't be reconstructed the same way.

2

pm_me_your_pay_slips t1_j7l6icx wrote

note that the VQ-VAE part of the SD model alone can encode and decode arbitrary natural/human-made images pretty well with very little artifacts. The diffusion model part of SD is learning a distribution of images in that encoded space.

1

orbital_lemon t1_j7lel1d wrote

The diffusion model weights are the part at issue, no? The question is whether you can squeeze infringing content out of the weights to feed to the vae.

1