In December last year, I've completed my MS in Data Science. My capstone project had to do with semantic segmentation of medical ultrasound images (TLDR: cancer detection). I used a transformer model based on SegFormer. After the project was completed, I tried to improve the model performance a bit more.

I was surprised by the IoU performance, which seemed a little too good to be true. I ended up writing my own metrics which calculated IoU, Dice, precision, and recall, among other things. My IoU results, computed with my own code, were consistently less than the IoU results I got from the library I was using at the time - the Evaluate library from Hugging Face. But their IoU was equal to what my code computed as recall (sensitivity). I've opened a ticket with Hugging Face:

https://github.com/huggingface/evaluate/issues/421

They basically said they had copied that whole code from OpenMMLab and I should take it up with them. So I did:

https://github.com/open-mmlab/mmsegmentation/issues/2655

That was more than a week ago and there's still no reply. Meanwhile I've seen other bug reports which appear to point at the same problem:

https://github.com/open-mmlab/mmsegmentation/issues/2594

I'm pretty sure I am right. The definition of IoU is quite simple, and there isn't much room there for interpretation. Their code fails simple test cases.

My concern is - since they effectively calculate recall instead of IoU, and recall is larger than, or equal to IoU, and since the MMSegmentation library is widely used in image segmentation research, it's possible there are quite a few results floating out there in the literature that are a few percentage points larger than what they should be - e.g. 90% IoU instead of 85%.

Thoughts?

Mediocre-Bullfrog686 t1_jb71qkj wrote on March 6, 2023 at 10:37 PM

Pixels with ignore_index mean that they should be ignored (e.g., pixels in the ground-truth image that the annotators are not sure about). It does not mean that they are from a "negative class". It is correct to ignore those pixels during IoU computation.

florinandrei OP t1_jb7cujz wrote on March 6, 2023 at 11:57 PM

The problem is: the current algorithm cuts holes in the prediction frames, based on ignore_index in the label frames.

Any pixels in the label frames equal to ignore_index will cause pixels in both label frames and prediction frames to be completely ignored from calculations. If some predicted mask pixels fall into those areas, they will be excluded from all calculations. This is the issue that needs to be addressed.

You cannot exclude pixels from the predicted frames based on pixel values in the label frames.

If there is some index you want to ignore altogether, because you are not sure about the quality of the labels, it is best to just exclude it from the calculation of the average metric.

If some users set ignore_index to the value of the background pixels, that will cut very large holes in everything, therefore discarding a lot of pixels from performance evaluation, and will severely skew the results.

Mediocre-Bullfrog686 t1_jb7n704 wrote on March 7, 2023 at 1:16 AM

>If there is some index you want to ignore altogether, because you are not sure about the quality of the labels, it is best to just exclude it from the calculation of the average metric.

Isn't this what the ignore_index is doing? How else should we exclude them from the average metric? By applying ignore_index we effectively ignore those pixels.

>If some users set ignore_index to the value of the background pixels, that will cut very large holes in everything, therefore discarding a lot of pixels from performance evaluation, and will severely skew the results.

Well the users definitely should not do that. This is then a matter of documentation. We cannot just get rid of ignore_index because (I think) it is used in some existing segmentation datasets.

florinandrei OP t1_jb7pefb wrote on March 7, 2023 at 1:33 AM

> Isn't this what the ignore_index is doing?

No, it is not.

Let me repeat: ignore_index cuts holes in both the ground truth label frames, and in the prediction frames coming out of the model. Any pixels in those holes are ignored.

This includes pixels in the predictions from the model. You are ignoring chunks of the model's output.

> How else should we exclude them from the average metric?

By not computing metrics for that pixel value.

average_metric = sum(metric_index1 + metric_index2 + ... + metric_indexN) / N

Simply do not include it in the sum, and then just divide by N-1 instead.

What you are doing is not equivalent to that. What you are doing is: you discard pixels from both label frame and prediction frame based on the shape of some regions in the label frame alone. That makes no sense. Whatever the model's predictions happen to be in those holes, they are ignored even if they have pixel values different from ignore_index.

You are ignoring all the model's predictions in those holes, regardless of their pixel values.

You are discarding pixels from the model's output even if they have values different from ignore_index.

lynnharry t1_jb8n8p0 wrote on March 7, 2023 at 6:38 AM

Pixels with ignore_index does not mean the model's output should also be ignore_index. It means the groundtruth label is not determined on those pixels and whatever your model's output is, its correctness is undetermined.

For those undetermined pixels, we simply ignore those outputs completely.

ignore_index is not used to ignore a specific category during the metric calculation, which is what you're proposing. ignore_index is simply notifying intersect_and_union some areas of the image have undetermined labels and should be ignored, and those areas are marked by the value of ignore_index.

LappenX t1_jbap16k wrote on March 7, 2023 at 6:09 PM

That is exactly what should be happening. In the Cityscapes dataset for example you can always see part of the vehicle in the bottom of the image, and these pixels are set to ignore to be excluded from training loss and test metrics.

[D] The MMSegmentation library from OpenMMLab appears to return the wrong results when computing basic image segmentation metrics such as the Jaccard index (IoU - intersection-over-union). It appears to compute recall (sensitivity) instead of IoU, which artificially inflates the performance metrics.

Comments