Abstract
Regression trees represent one of the most popular tools in predictive data mining applications. However, previous studies have shown that their performances are not completely satisfactory when the dependent variable is highly skewed, and severely degrade in the presence of heavy-tailed error distributions, especially for grossly mis-measured values of the dependent variable. In this paper the lack of robustness of some classical regression trees is investigated by addressing the issue of highly-skewed and contaminated error distributions. In particular, the performances of some non robust regression trees are evaluated through a Monte Carlo experiment and compared to those of some trees, based on M-estimators, recently proposed in order to robustify this kind of methods. In conclusion, the results obtained from the analysis of a real dataset are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atkinson, A., & Riani, M. (2000). Robust diagnostic regression analysis. New York: Springer.
Azzalini, A., & Scarpa, B. (2004). Analisi dei dati e data mining. Milano: Springer.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.
Chaudhuri, P., & Loh, W-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli, 8, 561–576.
Costa, M., Galimberti, G., & Montanari, A. (2006). Binary segmentation methods based on Gini index: A new approach to the multidensional analysis of income inequalities. Statistica & Applicazioni, IV, 123–141.
Galimberti, G., Pillati, M., & Soffritti, G. (2007). Robust regression trees based on M-estimators. Statistica, LXVII, 173–190.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.
John, G. H. (1995). Robust decision trees: Removing outliers from databases. In U. M. Fayyad & R. Uthurusamy (Eds.), Proceedings of the first international conference on knowledge discovery and data mining (KDD-95) (pp. 174–179). Montreal: AAAI Press.
Maronna, R. A., Martin, R. D., & Yohai, V. J. (2006). Robust statistics. Theory and methods. New York: Wiley.
Mascia, P., Miele, R., & Mola, F. (2005). Outlier detection in regression trees via forward search. In S. Zani & A. Cerioli (Eds.), Proceedings of the meeting of the classification and data analysis group of the Italian statistical society (pp. 429–432). Parma: Monte Università Editore.
R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. URL http://www.R-project.org.
Su, X., Wang, M., & Fan, J. (2004). Maximum likelihood regression trees. Journal of Computational and Graphical Statistics, 13, 586–598.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Galimberti, G., Pillati, M., Soffritti, G. (2011). Notes on the Robustness of Regression Trees Against Skewed and Contaminated Errors. In: Ingrassia, S., Rocci, R., Vichi, M. (eds) New Perspectives in Statistical Modeling and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11363-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-11363-5_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11362-8
Online ISBN: 978-3-642-11363-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)