In this paper, we consider the problem of outliers in incomplete multivariate data, when the aim is to estimate a measure of mean and covariance as it is the case for example in factor analysis. In such a situation the ER algorithm of Little and Smith (1987) which combines the EM algorithm for missing data and a robust estimation step based on an Mestimator could be used. However, the ER algorithm as originally proposed can fail to be robust in some cases especially in high dimensions. We propose here two alternatives to avoid the problem. One is to combine a small modification of the ER algorithm with a socalled high breakdown estimator as starting point for the iterative procedure and the other is to base the estimation step of the ER algorithm on a high breakdown estimator. Among the high breakdown estimators which are actually built to keep their robustness properties even if the number of variables is relatively large, we consider here the minimum covariance determinant (MCD) estimator and the t-biweight S-estimator. Simulated and real data are used to compare and illustrate the different procedures.

Cheng, T., Victoria Feser (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 55(2), 317-335 [10.1348/000711002760554615].

High-breakdown estimation of multivariate mean and covariance with missing observations

Victoria Feser
2002

Abstract

In this paper, we consider the problem of outliers in incomplete multivariate data, when the aim is to estimate a measure of mean and covariance as it is the case for example in factor analysis. In such a situation the ER algorithm of Little and Smith (1987) which combines the EM algorithm for missing data and a robust estimation step based on an Mestimator could be used. However, the ER algorithm as originally proposed can fail to be robust in some cases especially in high dimensions. We propose here two alternatives to avoid the problem. One is to combine a small modification of the ER algorithm with a socalled high breakdown estimator as starting point for the iterative procedure and the other is to base the estimation step of the ER algorithm on a high breakdown estimator. Among the high breakdown estimators which are actually built to keep their robustness properties even if the number of variables is relatively large, we consider here the minimum covariance determinant (MCD) estimator and the t-biweight S-estimator. Simulated and real data are used to compare and illustrate the different procedures.
2002
Cheng, T., Victoria Feser (2002). High-breakdown estimation of multivariate mean and covariance with missing observations. BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 55(2), 317-335 [10.1348/000711002760554615].
Cheng, T.-C.; Victoria Feser
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/950553
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 29
  • ???jsp.display-item.citation.isi??? ND
social impact