Notes on the Robustness of Regression Trees Against Skewed and Contaminated Errors

Galimberti, Giuliano; Pillati, Marilena; Soffritti, Gabriele

doi:10.1007/978-3-642-11363-5_29

Giuliano Galimberti⁴,
Marilena Pillati⁴ &
Gabriele Soffritti⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2257 Accesses
4 Citations

Abstract

Regression trees represent one of the most popular tools in predictive data mining applications. However, previous studies have shown that their performances are not completely satisfactory when the dependent variable is highly skewed, and severely degrade in the presence of heavy-tailed error distributions, especially for grossly mis-measured values of the dependent variable. In this paper the lack of robustness of some classical regression trees is investigated by addressing the issue of highly-skewed and contaminated error distributions. In particular, the performances of some non robust regression trees are evaluated through a Monte Carlo experiment and compared to those of some trees, based on M-estimators, recently proposed in order to robustify this kind of methods. In conclusion, the results obtained from the analysis of a real dataset are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkinson, A., & Riani, M. (2000). Robust diagnostic regression analysis. New York: Springer.
Book MATH Google Scholar
Azzalini, A., & Scarpa, B. (2004). Analisi dei dati e data mining. Milano: Springer.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.
MATH Google Scholar
Chaudhuri, P., & Loh, W-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli, 8, 561–576.
MATH MathSciNet Google Scholar
Costa, M., Galimberti, G., & Montanari, A. (2006). Binary segmentation methods based on Gini index: A new approach to the multidensional analysis of income inequalities. Statistica & Applicazioni, IV, 123–141.
Google Scholar
Galimberti, G., Pillati, M., & Soffritti, G. (2007). Robust regression trees based on M-estimators. Statistica, LXVII, 173–190.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.
MATH Google Scholar
John, G. H. (1995). Robust decision trees: Removing outliers from databases. In U. M. Fayyad & R. Uthurusamy (Eds.), Proceedings of the first international conference on knowledge discovery and data mining (KDD-95) (pp. 174–179). Montreal: AAAI Press.
Google Scholar
Maronna, R. A., Martin, R. D., & Yohai, V. J. (2006). Robust statistics. Theory and methods. New York: Wiley.
Book MATH Google Scholar
Mascia, P., Miele, R., & Mola, F. (2005). Outlier detection in regression trees via forward search. In S. Zani & A. Cerioli (Eds.), Proceedings of the meeting of the classification and data analysis group of the Italian statistical society (pp. 429–432). Parma: Monte Università Editore.
Google Scholar
R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. URL http://www.R-project.org.
Su, X., Wang, M., & Fan, J. (2004). Maximum likelihood regression trees. Journal of Computational and Graphical Statistics, 13, 586–598.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Scienze Statistiche “P. Fortunati”, Università di Bologna, Via delle Belle Arti 41, 40126, Bologna, Italy
Giuliano Galimberti, Marilena Pillati & Gabriele Soffritti

Authors

Giuliano Galimberti
View author publications
You can also search for this author in PubMed Google Scholar
Marilena Pillati
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Soffritti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giuliano Galimberti .

Editor information

Editors and Affiliations

Fac. Economia, Università di Catania, Corso Italia 55, Catania, 95129, Italy
Salvatore Ingrassia
Dipto. SEFEMEQ, Università di Roma - Tor Vergata, Via Columbia 2, Roma, 00133, Italy
Roberto Rocci
Dipto. Statistica, Probabilità e, Statistiche Applicate, Università di Roma, La Sapienza, Piazzale Aldo Moro 5, Roma, 00185, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galimberti, G., Pillati, M., Soffritti, G. (2011). Notes on the Robustness of Regression Trees Against Skewed and Contaminated Errors. In: Ingrassia, S., Rocci, R., Vichi, M. (eds) New Perspectives in Statistical Modeling and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11363-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-11363-5_29
Published: 31 March 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11362-8
Online ISBN: 978-3-642-11363-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics