Comments

You must log in or register to comment.

Meddhouib10 t1_jd792us wrote

Generally on medecine papers there is some sort of data leakage (like they do data augmentation before splitting to train, val and test)

24

memberjan6 t1_jd7tont wrote

Combining leakage, well known datasets, small test data sets, wrong metric ie accuracy on very unbalanced data, a and good data science choices otherwise , is not unexpected to see perfect accuracy. It certainly can be that accurate, but who cares given all these othher possible failings in. The analysis.

10

Alternative_iggy t1_jd7bgm4 wrote

I don’t typically deal in breast cancer histopathology models but I do work with medical imaging full time as my day job - if I’m reading this correctly they use the Wisconsin Breast Cancer dataset (originally released in 1995!: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic))

First question - have breast cancer histopathology evaluation techniques changed since 1995? Checking out a quick lit review - yes: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8642363/#Sec2

So is this dataset likely to be useful today? Well… we don’t know the demographics of the population, we don’t know the split of severity of tumors in the population (this could be all easy cancers and not very generalizable/ useful to what someone sees on a day to day!), and the preprocessing required would need someone to take the digital image and extract all these features which honestly probably takes the same amount of time as the pathologist looking at the image and evaluating it. Also it sort of looks like they just used the features that came with the dataset…

They report the 100% accuracy on the training set and 99% on the testing set - great, theoretically any model can get to 100% accuracy on the training set so I almost always ignore this completely when papers do this unless there is a substantial drop off between training and testing or vice versa. But next question - are these results in line with similar published results on this particular dataset? Here’s an ARXIV paper from 2019 with similar results: https://arxiv.org/pdf/1902.03825.pdf

So nothing new here… it seems it’s possible and has been previously published to get 99% accuracy on this dataset…

Next question - is procedia a good journal? It publishes conference paper proceedings with an impact factor of 0.8 (kind of low). It’s unlikely this hit a rigorous peer review process, although I don’t like to throw our conference journals just because some of the big cool clinical trial results and huge breakthroughs are dumped in places like there. But in this case it seems like two researchers trying to get a paper out and not necessarily a ground breaking discovery (people have published on this dataset before and gotten 99% with random forest before!).

Final conclusion: meh.

19

PassionatePossum t1_jd7eado wrote

Claims of 100% accuracy always sets off alarm bells.

I do work in the medical field and the problem is that there are lots of physicians who want to make easy money: Start a startup, collect some data (which is easy for them), download some model they have read about but don't really understand and start training.

I work for a medical device manufacturer and sometimes have to evaluate startups. And the errors they make are sometimes so basic that it becomes clear that they don't have the first clue what they are doing.

One of those startups claimed 99% accuracy on ultrasound images. But upon closer inspection their product was worthless. Apparently they know that they needed to split their data into training/validation/test set.

So what did they do? They took the videos and randomly assigned frames to one of these sets. And since two consecutive frames are very similar to each other, of course you are going to get 99% accuracy. It just means absolutely nothing.

8

Alternative_iggy t1_jd7fwg8 wrote

So true - also I always think to the skin cancer detection model that turned out to predict anything with an arrow pointing to it to be cancer because all of the cancerous lesions in their training set had arrows. (Paper showing this ended up in JAMA)

6

PassionatePossum t1_jd7h9dr wrote

:facepalm:

Yeah, that is exactly the level of mistakes I have to deal with.

Another classic that I see repeated over and over again is wildly unbalanced datasets: Some diseases are very rare, so for every sample of the disease you are looking for, there are 10000 or more samples that are normal. And often, they just throw it into a classifier and hope for the best.

And then you can also easily get 99% accuracy, but the only thing the classifier has learned, is to say "normal tissue", regardless of the input.

5

memberjan6 t1_jd7sa57 wrote

A gpt might be engineered to rea papers and report findings of common basic errors in analysis design like you found.

Probability calibration could be added later via telemetry revealing its level of accuracy of its own basic error classification s.

0

lucaatom t1_jd79s23 wrote

could it be there is some tag within the image setted by doctors ?

2

xixi_cheng t1_jd7lf0g wrote

it's always important to approach any claim of 100% accuracy with a critical eye. Achieving 100% accuracy is nearly impossible in any practical dataset, and it is usually an indication of overfitting or other statistical biases in the model.

It is also essential to examine the data transformation and feature selection process used in the model as these can have a significant impact on model performance and biases. It's important to ensure that these processes are transparent, unbiased, and validated using appropriate statistical methods.

2

SupplyChainPhd t1_jd74jyc wrote

Looked fine at first glance, imo. I know nothing about medicine, but it’s nice to see that each of the models evaluated were upper 90% on accuracy.

I would want to see how the model performs on a much larger data set before trusting the validity.

1

memberjan6 t1_jd7ubar wrote

A dumb model gets 99% acc on 1% prevalence of disease, easily and correctly.

Be careful what you ask for, like acc not f1

2

memberjan6 t1_jd7t53u wrote

It's plausible. To see 100% accuracy esp on well studied datasets or small set of new data. On a 20 example test set in the wild i witnessed exactly this. 20: is super small. The rules forbid me from using more .

1

le4mu t1_jd8cyno wrote

Don't believe a paper until you have their code and run it.

1

BrotherAmazing t1_jda5jna wrote

Either it’s an easy problem where 98% - 100% accuracy on samples this size is just typical and not really worth publishing, or (not exclusive) the study is flawed.

One could get a totally independent data set of FNA images with these features extracted from different patients in different years, etc. and run their random forest on those. If it gets 98% - 100% accuracy then this is not a hard problem (the feature engineering might have been hard—not taking away from that if so!). If it fails miserably or just gets waaaay lower that 100% you know the study was flawed.

There are so many ML neophytes making “rookie mistakes” with this stuff who don’t fully grasp basic concepts that I think you always need a totally new independent test set that the authors didn’t have access to in order to really test it. That’s even a good idea for experts to be honest.

The paper’s conclusion is likely wrong either way; i.e., that Random Forests are “superior” for this application. Did they get an expert in XGBoost, neural networks, etc and put as much time and effort into those techniques using the same training and test sets to see if they also got 99% - 100%? It didn’t appear so from my cursory glance.

1