Submitted by von-hust t3_11jyrfj in MachineLearning
InterlocutorX t1_jb6iw7y wrote
Reply to comment by Albino_Jackets in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
>The duplicates aren't perfect duplicates and are added to create more robust model results
This is incorrect and anyone who looks at the LAION5b aesthetic set can tell pretty easily. It's got easily viewable identical copies of images.
And the noisy Stallone was an SD image, not an image from the dataset.
[I looked at the images it has for Henry Cavil and 6 out of 24 images are the exact same Witcher promo shot. Which is a quarter of the images it has of Cavil.]
Feel free to look for yourself:
Viewing a single comment thread. View all comments