Regression trees represent one of the most popular tools in predictive data mining applications. However, previous studies showed that their performances are not completely satisfactory when the dependent variable is highly skewed, and severely degrade in the presence heavy-tailed error distributions, especially for grossly mis-measured values of the dependent variable. In this paper the lack of robustness of some classical regression trees is further investigating by addressing the issue of highly-skewed and contaminated error distributions. In particular, the performances of some non robust regression trees are investigated through a Monte Carlo experiment and compared to the ones of some trees, based on M-estimators, recently proposed in order to robustify this kind of methods. The results obtained from the analysis of a real dataset are also reported.
G. Galimberti, M. Pillati, G. Soffritti (2011). Notes on the robustness of regression trees against skewed and contaminated errors. BERLIN : Springer-Verlag [10.1007/978-3-642-11363-5_29].
Notes on the robustness of regression trees against skewed and contaminated errors
GALIMBERTI, GIULIANO;PILLATI, MARILENA;SOFFRITTI, GABRIELE
2011
Abstract
Regression trees represent one of the most popular tools in predictive data mining applications. However, previous studies showed that their performances are not completely satisfactory when the dependent variable is highly skewed, and severely degrade in the presence heavy-tailed error distributions, especially for grossly mis-measured values of the dependent variable. In this paper the lack of robustness of some classical regression trees is further investigating by addressing the issue of highly-skewed and contaminated error distributions. In particular, the performances of some non robust regression trees are investigated through a Monte Carlo experiment and compared to the ones of some trees, based on M-estimators, recently proposed in order to robustify this kind of methods. The results obtained from the analysis of a real dataset are also reported.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.