One tree is not enough – Random forest

One tree is not enough – Random forest

سلام د خوښې کس

this weeks paper is a classic. It talks about how to combine several decision trees into a so called (random forest – term coined by the paper).
In the end the concept is quite easy (despite all the fancy math in the paper): If you ensemble several (the more the better) DIFFERENT decision trees, the overall result will be more general and less biased 🙂


Abstract:

Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Freund and Schapire[1996]), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

Download Link:

https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf


Additional Links:

Weekly in-depth computer science knowledge to become a better programmer. For free!
Over 2000 subcribers. One click unsubscribe.