Submitted by SuchOccasion457 t3_11bvmia in MachineLearning
Say one wanted to model how much getting access to data would cost, how should one go about that? If labeling costs for say CIFAR10 are known with SageMaker and Google Cloud, what is the cost of getting the data in the first place?
Furthermore, say we move into the space of medical images e.g. MRI scans. What is the cost of getting MRI scans with a given desease? Where do I even find such information?
PassionatePossum t1_ja0llsg wrote
The data is always the most expensive part. I work in the medical device industry and it strongly depends on the type of data and how much effort it is for the physicians to collect it.
In the simplest case you can just run a recording device while they are doing their procedures. But of course it rarely is that simple: You need to be careful not to capture any data that can be used to personally identify the patient (and the definition of personally identifying information is - at least in Europe - extremely wide).
The next question is: Do you need any lab data as groundtruth? If the answer is "yes", it will create a lot of effort for the physician because he/she can not simply record the data. They will have to keep track of the patients, recordings and diagnosis and annotate them later accordingly.
Another thing to keep in mind is: In many cases you cannot just connect a non-certified device to a medical device. You often need special recording hardware that is medically certified. That probably mostly is the case for surgical devices. The rules for MRI images migth be more relaxed. I don't know.
As a rough guideline you can expect to pay physicians around 200€ / hour (in the U.S. likely even more than that). And as I said: How much data you get for that, strongly depends on the type of data that you collect.