Submitted by SpookyTardigrade t3_zn4egx in MachineLearning
I'm intrigued by random forests but it looks like there's really no open problems in this area. A quick skim on Google Scholar shows, mostly, applications of random forests in various industries/problems. Are there research groups working on random forests?
BrisklyBrusque t1_j0hx440 wrote
Yes, lots. For example, in 2019 a paper introduced a new split rule for categorical variables that reduces computational complexity.
https://peerj.com/articles/6339/
A lot of researchers are also exploring adjacent tree ensembles such as extremely randomized trees (2006) and Bayesian additive regression trees (2008). The former is very similar to random forests. There is a strong possibility other tree ensembles have yet to be discovered!
If you’re a fan of computer science / optimized code, there is a great deal of research concerning making tree models faster. The ranger library in R was introduced as an improvement on the randomForest package. There is also interest in making random forests scale up to millions of variables, to deal with genetics data.
Hummingbird is a Microsoft project that seeks to refactor common machine learning methods using tensor algebra, so those methods can take advantage of GPUs. I don’t know if they got around to random forests yet.
Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.