To assess the quality of the fit in a multiple linear regression, the coefficient of determination or R(2) is a very simple tool, yet the most used by practitioners. Indeed, it is reported in most statistical analyzes, and although it is not recommended as a final model selection tool, it provides an indication of the suitability of the chosen explanatory variables in predicting the response. In the classical setting, it is well known that the least-squares fit and coefficient of determination can be arbitrary and/or misleading in the presence of a single outlier. In many applied settings, the assumption of normality of the errors and the absence of outliers are difficult to establish. In these cases, robust procedures for estimation and inference in linear regression are available and provide a suitable alternative.In this paper we present a companion robust coefficient of determination that has several desirable properties not shared by others. It is robust to deviations from the specified regression model (like the presence of outliers), it is efficient if the errors are normally distributed, it does not make any assumption on the distribution of the explanatory variables (and therefore no assumption on the unconditional distribution of the responses). We also show that it is a consistent estimator of the population coefficient of determination. A simulation study and two real datasets support the appropriateness of this estimator, compared with classical (least-squares) and several previously proposed robust R(2), even for small sample sizes. (C) 2010 Elsevier By. All rights reserved.
Olivier Renaud, Maria-Pia Victoria-Feser (2010). A robust coefficient of determination for regression. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 140(7), 1852-1862 [10.1016/j.jspi.2010.01.008].
A robust coefficient of determination for regression
Maria-Pia Victoria-Feser
2010
Abstract
To assess the quality of the fit in a multiple linear regression, the coefficient of determination or R(2) is a very simple tool, yet the most used by practitioners. Indeed, it is reported in most statistical analyzes, and although it is not recommended as a final model selection tool, it provides an indication of the suitability of the chosen explanatory variables in predicting the response. In the classical setting, it is well known that the least-squares fit and coefficient of determination can be arbitrary and/or misleading in the presence of a single outlier. In many applied settings, the assumption of normality of the errors and the absence of outliers are difficult to establish. In these cases, robust procedures for estimation and inference in linear regression are available and provide a suitable alternative.In this paper we present a companion robust coefficient of determination that has several desirable properties not shared by others. It is robust to deviations from the specified regression model (like the presence of outliers), it is efficient if the errors are normally distributed, it does not make any assumption on the distribution of the explanatory variables (and therefore no assumption on the unconditional distribution of the responses). We also show that it is a consistent estimator of the population coefficient of determination. A simulation study and two real datasets support the appropriateness of this estimator, compared with classical (least-squares) and several previously proposed robust R(2), even for small sample sizes. (C) 2010 Elsevier By. All rights reserved.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.