PassionatePossum

PassionatePossum t1_jd7h9dr wrote

:facepalm:

Yeah, that is exactly the level of mistakes I have to deal with.

Another classic that I see repeated over and over again is wildly unbalanced datasets: Some diseases are very rare, so for every sample of the disease you are looking for, there are 10000 or more samples that are normal. And often, they just throw it into a classifier and hope for the best.

And then you can also easily get 99% accuracy, but the only thing the classifier has learned, is to say "normal tissue", regardless of the input.

5

PassionatePossum t1_jd7eado wrote

Claims of 100% accuracy always sets off alarm bells.

I do work in the medical field and the problem is that there are lots of physicians who want to make easy money: Start a startup, collect some data (which is easy for them), download some model they have read about but don't really understand and start training.

I work for a medical device manufacturer and sometimes have to evaluate startups. And the errors they make are sometimes so basic that it becomes clear that they don't have the first clue what they are doing.

One of those startups claimed 99% accuracy on ultrasound images. But upon closer inspection their product was worthless. Apparently they know that they needed to split their data into training/validation/test set.

So what did they do? They took the videos and randomly assigned frames to one of these sets. And since two consecutive frames are very similar to each other, of course you are going to get 99% accuracy. It just means absolutely nothing.

8

PassionatePossum t1_jd62upb wrote

I don't think you could have more red flags in a job posting if you tried: No technical know-how but a big unspecified vision. No employees. I guess that means no salary either. "Access to high-net worth individuals". So no funding, yet?

Sounds like you have nothing. No funding, no team, no idea whether what you are doing is technically feasible or how much it would cost or what resources you would need. And additionally you want people to sign up for "something" and assume all the financial risk themselves. Does that sound about right? It is basically the meme of the "ideas guy".

12

PassionatePossum t1_jb4977c wrote

Agreed. Sometimes theoretical analysis doesn't transfer to the real world. And sometimes it is also valuable to see a complete system. Because the whole training process is important.

However, since my days in academia are over, I am much less interested in getting the next 0.5% of performance out of some benchmark dataset. In industry you are way more interested in a well-working solution that you can produce quickly instead of the best-performing solution. So, I am way more interested in a tool set of ideas that generally work well and ideally a knowledge of what the limitations are.

And yes, while papers about applications can provide practical validation of these ideas, very few of these papers conduct proper ablation studies. And in most cases it is also too much to ask. Pretty much any application is a complex system with an elaborate pre-processing and training procedure. You cannot practically evaluate the influence of every single step and parameter. You just twiddle around with the parameters you deem to be most important and that is your ablation study.

5

PassionatePossum t1_jb0xvdo wrote

Thanks. I'm a sucker for this kind of research: Take a simple technique and evaluate it thoroughly, varying one parameter at a time.

It often is not as glamourous as some of the applied stuff. But IMHO these papers are a lot more valuable. With all the applied research papers, all you know in the end that someone had better results. But nobody knows where these improvements actually came from.

411

PassionatePossum t1_jalnb1q wrote

I think, my professor summarized it very well: "Genetic algorithms is what you do when everything else fails."

What he meant by that is, that they are very inefficient optimizers. You need to evaluate lots and lots of configurations because you are stepping around more of less blindly in the parameter space and you are only relying on luck and a few heuristics to improve your fitness. But their advantage is that they will always work as long as you can define some sort of fitness function.

If you can get a gradient, you are immediately more efficient because you already know in which direction you need to step to get a better solution.

But of course there is room for all algorithms. Even when you can do gradient descent, there are problems where it quickly gets stuck in a local optimum. There are approaches how to "restart" the algorithm to find a better local optimum. I'm not that familiar with that kind of optimization but it is not inconceivable that genetic algorithms might have a role to play in such a scenario.

24

PassionatePossum t1_ja0sovs wrote

Sorry, I am not aware of any literature. The contractual stuff is mostly handled between ours and the hospital's lawyers. I'm not really involved in all of that.

And I'm not even sure that you'll find a one-size fits all answer. The requirements for a collaboration can vary wildly. A private practice usually has much more flexibility when it comes to technical infrastructure. In a hospital, the IT department usually wants to know when you are planning to connect stuff to their systems. But there are certain diseases that you are very unlikely to see in a private practice.

I would do the following: Once you know what kind of data you need, talk to physicians to understand their workflow. Then make a proposal how to collect the data and talk it through with them or their lawyers. If there is a potential problem with the plan, they'll tell you.

Once you know the workflow, you'll probably also have an idea how long it will take for them to collect the data you are looking for and from there you can make an educated guess how much it is going to cost you.

The rest is up for negotiation. As far as I know we have contracts that have built-in safeguards for both the physicians as us. They get a fixed price for a fixed number of hours they work for us. And they guarantee a certain minimum number of recorded procedures. If they can deliver more in the alotted time, even better.

2

PassionatePossum t1_ja0llsg wrote

The data is always the most expensive part. I work in the medical device industry and it strongly depends on the type of data and how much effort it is for the physicians to collect it.

In the simplest case you can just run a recording device while they are doing their procedures. But of course it rarely is that simple: You need to be careful not to capture any data that can be used to personally identify the patient (and the definition of personally identifying information is - at least in Europe - extremely wide).

The next question is: Do you need any lab data as groundtruth? If the answer is "yes", it will create a lot of effort for the physician because he/she can not simply record the data. They will have to keep track of the patients, recordings and diagnosis and annotate them later accordingly.

Another thing to keep in mind is: In many cases you cannot just connect a non-certified device to a medical device. You often need special recording hardware that is medically certified. That probably mostly is the case for surgical devices. The rules for MRI images migth be more relaxed. I don't know.

As a rough guideline you can expect to pay physicians around 200€ / hour (in the U.S. likely even more than that). And as I said: How much data you get for that, strongly depends on the type of data that you collect.

5

PassionatePossum t1_j9t1e09 wrote

I have seen image classification networks being used to classify sounds via spectrograms. It is perfectly conceivable to use ML to analyze spectrograms or to manipulate them and turn them back into sounds. Of course you can also do that directly by using time-series models.

But as long as you have a problem that can be modeled mathematically, you are usually better off to stick to mathematical models. They are usually more computationally efficient and predictable.

1

PassionatePossum t1_j8y9oam wrote

Reply to comment by [deleted] in [D] Coauthor Paper? by [deleted]

I assume that you are based in the U.S. I'm not really familiar with the U.S. system of "grad school" so take what I say with a grain of salt.

Publishing a paper is certainly a good way to show your professor, that you are capable of doing research but probably not absolutely necessary. Having a reputation as a reliable and capable student should also go a long way to convince your professor that you are a good cancidate.

Working with one of the PhD students on their research project should also be a good way to earn your professor's trust.

1

PassionatePossum t1_j8w86k2 wrote

People end up as co-authors on papers for all sorts of reasons. Some co-authors contributed as much to the paper as the primary author. But most of the time the co-authors didn't do a whole lot (maybe just provided some data). Without knowing anything about the paper and how it was produced, I tend to assume the latter.

But as an undergraduate it is definitely something you can point to during interviews. Having already worked on a research project (even if it is just in a minor capacity) makes you more interesting as a candidate. And it serves as a nice entry point into the interview. From there one can discuss what exactly you contributed, what you have learned while doing so and so on.

So I would say: Notable, yes. Something special, no.

9

PassionatePossum t1_j7f1xr8 wrote

I'm not sure I follow the question. Why would there need to be special research for high-FPS cameras? The challenge with all video-based systems is to capture long-range dependencies. And "long-range" is defined over the number of frames. How much time has elapsed between the frames doesn't really matter.

However, if you have a high-FPS camera and a slow moving scene, you'll have a lot of images are are pretty much identical to each other. That means, according to information theory there is very little additional information in each frame. In that case you might want to consider to do temporal downsampling on your data. If you have a fast moving scene and you really need to take advantage of updating your prediction for every single frame, the only constraint is processing power.

So in that case, the problem of inference for high-FPS cameras is the same as computationally efficient models. And there are a few models who are intended to be run on mobile devices. Maybe you want to look into that.

4

PassionatePossum t1_j67yw1f wrote

Here is what I do. For laptops I always want to have a good battery life. I don't need my laptop to be particularly powerful. I have a normal business laptop running Linux. I personally like Lenovo laptops because they play nice with Linux and I just love the keyboard.

I work remotely. We have our own compute server. But you could just as easily work on an AWS instance.

To me it makes more sense to use a laptop for what it is good at (Mobility) and a stationary server for what it is good at (Power). In the past I've tried to make compromises and it always is a lousy compromise.

1

PassionatePossum t1_j4uf4dn wrote

Part of the problem is the voting system. Because it allows a party to have an absolute majority in parliament despite only having a simple majority in the popular vote. So despite the fact that they have less than 50% of the popular vote they (more or less) get to push 100% of their policy goals.

In a system of proportional representation a party with a simple majority would need to form coalitions and make compromises to that at least some policy goals of the coalition partner are on the agenda.

22

PassionatePossum t1_j429c7y wrote

Admittedly, I just skimmed the paper. But I found it weird that DVC wasn’t mentioned at all. Maybe I missed something but it seems to address the same or at least a similar use case.

I think it would deserve at least a mention in the related work section and a discussion what is different or better in XetHub. Maybe even a performance comparison between the two would be interesting.

35

PassionatePossum t1_j2xnqus wrote

It absolutely can, but it depends on the task.

When it comes to questions of human estimation, there is the Wisdom of the Crowd effect where individual people can either over- or underestimate a value, but the average of many predictions often is extremely accurate. You can either train a system directly on the average of many human predictions or you only have a single prediction for each sample and during training the errors tend to average out.

And even for tasks where there is a definite ground truth available that doesn't depend on human estimation it can happen. For example in many medical applications the diagnosis is provided by the lab, but physicians often need to make a decision whether it is worth it to take a biopsy and send to the lab. In this case they often are taught heuristics: A few features by which they decide whether to have a sample analyzed.

If you take samples and use the groundtruth provided by the lab, it is not inconceivable that a deep learning based classifier can discover additional features that are not used by physicians that lead to better predictions. Obviously, it won't get better than the lab results.

As you migth have guessed, I work in the medical field. And we have done the evaluation for one of our products: It is not able to outperform the top-of-the-line experts in the field. But it is easily able to outperform the average physician.

8

PassionatePossum t1_j2st7br wrote

42 examples is obviously very little to go on. But one way to do it would be to use AdaBoost. You can use the classifier weights to assign feature importance.

A very similar problem is addressed in the Viola-Jones face detector (see Viola, Jones, Rapid Object Detection using a Boosted Cascade of Simple Features, 2001) where they select a couple thousand features out of over 100k of features.

1

PassionatePossum t1_j231qxj wrote

We protect our models with TPMs. The model is stored on the device in encrypted form using a device-specific key. During boot-up, the state of the system is compiled into a hash value by the TPM which then decrypts the model. If the system or the software running on it has been modified, this decryption will fail.

The nice thing about TPMs is that they are write-only. Nobody gets to see the key, except for the TPM.

However, be careful not to use GPLv3 licensed software. GPLv3 not only requires you to open-source the software (which is something I could live with) but also demands complete access to the hardware you are running it on (which is completely bonkers).

5

PassionatePossum t1_j0ch2oy wrote

It is not true that neural networks are only suitable for big datasets. It is just that modern deep architectures that are trained from scratch require relatively large datasets.

But there are plenty of alternatives if you want to do ML with small datasets. Use pre-trained networks if possible. Use smaller network architectures. Or don’t use neural networks at all. There are lots of approaches from the era before deep neural networks (e.g. support vector machines, decision trees, probabilistic models, like Bayesian networks) and they are not obsolete.

Choose the right approach for the problem and don’t think that because deep neural networks are the new hotness, that you need a neural network.

3

PassionatePossum t1_izi24ow wrote

You can still parallelize using batch gradient descent. If you for example use the MirroredStrategy in Tensorflow you split up the batch between multiple GPUs. The only downside is, that this approach doesn’t scale well if you want to train on more than one machine since the model needs to be synced after each iteration.

But you should think long and hard whether training on multiple machines is really necessary since that brings a whole new set of problems. 700GB is not that large. We do that all the time. I don’t know what kind of model you are trying to train but we have a GPU Server with 8 GPUs and I’ve never felt the need to go beyond the normal MirroredStrategy for parallelization. And should you run into the problem that you cannot fit the data onto the machine where you are training: Load it over the network.

You just need to make sure that your input pipeline supports that efficiently. Shard your dataset so you can have many concurrent I/O operations.

And in case scaling is really important to you. May I suggest you look into Horovod?

2

PassionatePossum t1_ivnplxx wrote

The Chinese room argument always seemed like philosophical bullshit to me. Right from the start, it assumes that there is a difference between „merely following rules“ and „true intelligence“.

Whatever our brain is doing, it is also following rules, namely the laws of physics that govern how electrical charges build up and are passed to the next neurons. I hope that nobody is arguing that we couldn‘t simulate a brain at least in principle. Because suggesting otherwise would be to believe in magic.

And if there is no fundamental difference between following rules and intelligence, the whole argument just becomes silly since intelligence isn‘t something that is either there or not, it becomes a spectrum and the only interesting question remains at which point we define intelligence as „human-like“.

16

PassionatePossum t1_iu3k7ga wrote

For me, speed is only important when it comes to inference. And for the problems I work on (medical imaging) I don't scale the hardware to the model, I scale the model to the hardware.

I don't train models that often and when I do I don't really care if it takes 2 days or 2,5 days. Of course, if you are working on really large-scale problems or need to search through a huge parameter space you will care about speed but I doubt the average guy is that interested. During training the most important resource for me is VRAM.

3