Skip to main content

Notes on the Robustness of Regression Trees Against Skewed and Contaminated Errors

  • Conference paper
  • First Online:
New Perspectives in Statistical Modeling and Data Analysis

Abstract

Regression trees represent one of the most popular tools in predictive data mining applications. However, previous studies have shown that their performances are not completely satisfactory when the dependent variable is highly skewed, and severely degrade in the presence of heavy-tailed error distributions, especially for grossly mis-measured values of the dependent variable. In this paper the lack of robustness of some classical regression trees is investigated by addressing the issue of highly-skewed and contaminated error distributions. In particular, the performances of some non robust regression trees are evaluated through a Monte Carlo experiment and compared to those of some trees, based on M-estimators, recently proposed in order to robustify this kind of methods. In conclusion, the results obtained from the analysis of a real dataset are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Atkinson, A., & Riani, M. (2000). Robust diagnostic regression analysis. New York: Springer.

    Book  MATH  Google Scholar 

  • Azzalini, A., & Scarpa, B. (2004). Analisi dei dati e data mining. Milano: Springer.

    Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont: Wadsworth.

    MATH  Google Scholar 

  • Chaudhuri, P., & Loh, W-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli, 8, 561–576.

    MATH  MathSciNet  Google Scholar 

  • Costa, M., Galimberti, G., & Montanari, A. (2006). Binary segmentation methods based on Gini index: A new approach to the multidensional analysis of income inequalities. Statistica & Applicazioni, IV, 123–141.

    Google Scholar 

  • Galimberti, G., Pillati, M., & Soffritti, G. (2007). Robust regression trees based on M-estimators. Statistica, LXVII, 173–190.

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.

    MATH  Google Scholar 

  • John, G. H. (1995). Robust decision trees: Removing outliers from databases. In U. M. Fayyad & R. Uthurusamy (Eds.), Proceedings of the first international conference on knowledge discovery and data mining (KDD-95) (pp. 174–179). Montreal: AAAI Press.

    Google Scholar 

  • Maronna, R. A., Martin, R. D., & Yohai, V. J. (2006). Robust statistics. Theory and methods. New York: Wiley.

    Book  MATH  Google Scholar 

  • Mascia, P., Miele, R., & Mola, F. (2005). Outlier detection in regression trees via forward search. In S. Zani & A. Cerioli (Eds.), Proceedings of the meeting of the classification and data analysis group of the Italian statistical society (pp. 429–432). Parma: Monte Università Editore.

    Google Scholar 

  • R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. URL http://www.R-project.org.

  • Su, X., Wang, M., & Fan, J. (2004). Maximum likelihood regression trees. Journal of Computational and Graphical Statistics, 13, 586–598.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuliano Galimberti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Galimberti, G., Pillati, M., Soffritti, G. (2011). Notes on the Robustness of Regression Trees Against Skewed and Contaminated Errors. In: Ingrassia, S., Rocci, R., Vichi, M. (eds) New Perspectives in Statistical Modeling and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11363-5_29

Download citation

Publish with us

Policies and ethics