Hi,

I'm trying to use the Open Image Dataset to train Yolov5 model. However, in the dataset, not all of object in a image are labelled. For example, there are a computer, a table, and a chair in a image, and the computer and the table are labelled with bouding-box, but the chair is not labelled. And many other images including chairs have labels of chairs.

Then, it will affect to the training. I want to know that if I want to ignore unlabelled objects in some images for computing the loss, how could I do for it?

Please let me know the solution or some websites that have the answer.

Comments

You must log in or register to comment.

mmeeh t1_jdupe4m wrote on March 27, 2023 at 10:11 AM

Either you label the chairs in the non-labelled images or you remove the chairs from the labelled images... You can try some thechniques of image manipulation to remove the non labelled chairs but unless you use photoshop or some high advanced AI to remove those chairs, it's a really bad idea to use that dataset to train a model to recognize objects that include chairs....

AI-without-data OP t1_jdurfc6 wrote on March 27, 2023 at 10:38 AM

Thank you for the comment. So do I need to modify images to remove chairs one by one by manually?

mmeeh t1_jdusxmd wrote on March 27, 2023 at 10:56 AM

each of this BBOX has a label when you are training with yolov, you can just write some code that identifies the BBOX which have the label of chair so you can exclude them from the dataset

AI-without-data OP t1_jdv5l7k wrote on March 27, 2023 at 1:02 PM

I need the chair class for the model, but some images don't have the label of chair even though the images include chairs. And I want to use those images for training because they include other classes that should be trained.

thebruce87m t1_jdvby1c wrote on March 27, 2023 at 1:53 PM

Sounds like you need to label the chairs.

AI-without-data OP t1_jdvfwff wrote on March 27, 2023 at 2:21 PM

So do I need to label chairs in all the images?

Ok, for example, there is famous dataset which is COCO dataset. But some objects, for example 'book', exist but are not labeled in some images (not all. many of images have labeled 'book' object). And people use the dataset for training and detect 'book' object well somehow. I just want to know how they handle the unlabeled 'book' in some data.

thebruce87m t1_jdvhv7r wrote on March 27, 2023 at 2:35 PM

If you want it to be good at detecting books then yes, all books should be labelled.

If they are not, what are the implications? Perhaps book detections will have lower confidence than they should? Maybe it will ignore some styles of books?

AI-without-data OP t1_jdviqi9 wrote on March 27, 2023 at 2:41 PM

I got it. Thank you so much for your answer.

deepForward t1_jdvdq24 wrote on March 27, 2023 at 2:06 PM

If you already have some labeled chairs, train a first model with that, then run it on images with chairs and no label. Have a second pass with your enriched dataset, and eventually a third, etc.

You can bootstrap the labelling that way. It should help you label a decent amount of chairs, and you can then label manually the remaining chairs.

AI-without-data OP t1_jdvhnjr wrote on March 27, 2023 at 2:34 PM

Thank you. But I don't understand it clearly.

Do people train the model in that way as well? In the COCO dataset, some images contain objects that are not labeled but are listed in the classes.

If people follow your suggested method for training the model, they would need to first filter out images with perfectly labeled objects (no missed labels) from the COCO dataset and use that filtered data to train the model. Then they would need to run the model on the remaining data to obtain labels for objects that are not included in the dataset, and update the entire dataset accordingly. Is this correct?

[deleted] t1_jdvn3rn wrote on March 27, 2023 at 3:10 PM

[deleted]

qphyml t1_jdz7ndt wrote on March 28, 2023 at 7:26 AM

I think you can do it both ways (with or without filtering) and compare. Just speculating now, but the filtering could potentially affect the performance on the other classes (since you change the model’s training path for those classes ). But my guess is that that should not be a big issue, so I’d probably go about it the way you described if I had to pick one strategy.

AI-without-data OP t1_jdzb8fd wrote on March 28, 2023 at 8:19 AM

Ok I appreciate! I'm trying to filter out images.

qphyml t1_jdzg13b wrote on March 28, 2023 at 9:31 AM

Good luck! Would be great to hear how it goes and which insights you get!

deepForward t1_je0ktqx wrote on March 28, 2023 at 3:31 PM

Try the easy way first :

Build a model that only learns chairs, with all labeled chairs you have and ignore anything else at first.

Try also image data augmentations and see if it helps.

You are not looking at having the best score, actually you dont care about your score as long as you can label new chairs.

You mostly want to tune the model so that you don't have false positives (and introduce noise in your labels). False negatives are OK, and will occur if you tune the model so that FP are zero. You can tune for instance the threshold on a confidence score or class probability (check the model you're using).

You can also build a basic image validation tool with jupyter notebook widgets, steamlit, or your favorite tool, if you want to validate quickly by hand that they are no false positives. It's a very good exercise.

Good luck !

AI-without-data OP t1_je5078l wrote on March 29, 2023 at 1:41 PM

I see. I think changing the threshold of confidence score and probaility is good idea. I should try the ways step by step. Thank you!

stuv_x t1_jdxwmwm wrote on March 28, 2023 at 12:11 AM

It might not affect training as much as you think - you could also modify your loss function so it doesn’t penalise false positives as much as false negatives

AI-without-data OP t1_jdy05d6 wrote on March 28, 2023 at 12:37 AM

Ok I will try to modify loss function as you say. Thank you!