Submitted by hopedallas t3_zmaobm in MachineLearning
Far-Butterscotch-436 t1_j0a9083 wrote
5% imbalance isn't bad. Just use a cost function that uses a metric to handle imbalance. Ie, the weighted average binomial deviance and you'll be fine.
Also you can create downsampling ensemble to compare performance and compare. Don't downsample to 50/50, try for at least 10%
You've got a good problem, lots of observations with few features
trendymoniker t1_j0acn6e wrote
👆
1e6:1 is extreme. 1e3:1 is often realistic (think views to shares on social media). 18:1 is a actually a pretty good real world ratio.
If it were me, I’d just change the weights for each class in the loss function to get them more or less equal.
190m examples isn’t that many either — don’t worry about it. Compute is cheap — it’s ok if it takes more than one machine and/or more time.
hopedallas OP t1_j0ailbn wrote
Thanks for the hint. Sorry not sure what you mean by “try for 10%”?
Far-Butterscotch-436 t1_j0ajlfn wrote
When you downsample try to get at least 1:10 ratio (minority:majority)
Viewing a single comment thread. View all comments