GusPlus t1_j7i5lt3 wrote on February 6, 2023 at 11:30 PM

Reply to comment by Ne_Nel in [N] Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement by Wiskkey

I’d like to know how it was trained to produce GI watermark without copying GI images for training data.

Ne_Nel t1_j7i67yp wrote on February 6, 2023 at 11:35 PM

What are you talking about? The dataset is open source and there are thousands of Getty images. That isn't the discussion here.

orbital_lemon t1_j7idllq wrote on February 7, 2023 at 12:27 AM

It saw stock photo watermarks millions of times during training. Nothing else in the training data comes even close. Even at half a bit per training image, that can add up to memorization of a shape.

Apart from the handful of known cases involving images that are duplicated many times in the training data, actual image content can't be reconstructed the same way.

pm_me_your_pay_slips t1_j7l6icx wrote on February 7, 2023 at 4:31 PM

note that the VQ-VAE part of the SD model alone can encode and decode arbitrary natural/human-made images pretty well with very little artifacts. The diffusion model part of SD is learning a distribution of images in that encoded space.

orbital_lemon t1_j7lel1d wrote on February 7, 2023 at 5:23 PM

The diffusion model weights are the part at issue, no? The question is whether you can squeeze infringing content out of the weights to feed to the vae.