Submitted by AutoModerator t3_zp1q0s in MachineLearning
sanman t1_j0ynjfi wrote
How to Handle Lots of Missing/Null Values in Data?
There's a data set that I've been given to analyze, and it's got a lot of missing data. Typically, I should replace missing values with mean, or mode, etc. But one particular column has nearly 70% null values. What is the threshold to reject a column as unsuitable for analysis, instead of trying to replace those missing values? How large a proportion of missing values is acceptable before I have to reject/discard the column altogether? Is there some rule of thumb for this?
Viewing a single comment thread. View all comments