MedicUK_ t1_j0xv5np wrote on December 20, 2022 at 5:28 AM

Hey everyone, I’m a medical doctor practising in the UK, I was going to be undertaking a project trying to use machine learning to predict mortality in patients with colorectal cancer. I was going to use a supervised approach using data over a series of different time points, i.e value X over 5 days post operatively, I was wondering is this something possible with machine learning I.e to use a trend to predict an outcome as opposed to a static value at one point in time, if so what statistical approach would be best to use?

trnka t1_j1hdb1o wrote on December 24, 2022 at 10:24 AM

For prediction of mortality I'd suggest looking into survival analysis. The challenge with mortality is that you don't know when everyone will die, only some of those that have happened so far. They call this data censoring. So to work with data they reframe the problem into "predict whether patient P will be alive after D days since their operation"

A quick Google suggests that 90-day mortality is a common metric so I'd suggest starting there. For each patient you'd want to record mortality at 90-days as alive/dead/unknown. From there you could use traditional machine learning methods.

If the time points are standardized across patients you could use them like regular features, for instance feature1_at_day1, feature1_at_day2, ... If they aren't standardized across patients you need to get them into the same representation first. I'd suggest starting simple, maybe something like feature1_week1_avg, feature1_week2_avg, and so on. If you want to get fancier about using the trend of the measurement as input, you could fit a curve to each feature for each patient over time and use the parameters of the curve as inputs. Say if you fit a linear equation, y = mx + b, where x = time since operation and y = the measurement you care about. In that case you would fit m & b and then use those as inputs to your model. (All that said, definitely start simple)

The biggest challenge I'd expect is that you probably don't have a lot of mortality so machine learning is likely to overfit. For dealing with that I'd suggest starting very, very simple like regularized logistic regression to predict 90-day mortality. Keep in mind that adding features may not help you if you don't have much mortality to learn from.

Hope this helps! I've worked in medical machine learning for years and done some survival analysis but not much. We were in primary care so there was very little mortality to deal with.