Submitted by qazokkozaq t3_102983u in MachineLearning
Hi,
I faced a classification problem like this: Given a measurement of 18K different variables of 42 samples, each sample is classified as class_0 or class_1, divided near equally (19 belongs to class_0, 23 belongs to class_1) what is the right approach to eliminate these features to a minimum level, so that the classifier is still predicting correct classses.
I do not provide any domain knowledge for now, but can hint a little bit more, if needed.
HateRedditCantQuitit t1_j2sinhq wrote
I think people often see this sort of p >> N data in genetics?
ESL II has a whole chapter on p >> N problems (ch 18) https://hastie.su.domains/ElemStatLearn/