We exploit a multi-step methodology, called ROBOUT, to recover outliers in a dependent variable conditional on its predictors, to be identified from a large number of potential predictors. ROBOUT entails a preliminary imputation procedure to identify potential leverage outliers, a robust variable selection (via LASSO-penalized Huber loss regression), a robust regression (via MM method) and an outlier detection step. We show in a simulation study that ROBOUT is the most effective methodology for what concerns outlier detection, predictor recovery, and coefficient estimation, even compared to existing integrated procedures like SPARSE-LTS and RLARS. We then apply ROBOUT to a real granular dataset on euro area banks, with the aim to predict log-cash. We identify the most relevant granular indicators useful in predicting the logarithm of cash holdings, like financial guarantees to households or debt securities and term loans to non-financial corporations, and we simultaneously recover the two banks with wrongly reported log-cash values, as well as a set of banks with disproportionately high values of log-cash with respect to the identified predictor values.
Farne, M., Vouldis, A. (2024). The Importance of Robust Second-Stage Regressions for Financial Data. Cham : Springer.
The Importance of Robust Second-Stage Regressions for Financial Data
Matteo Farne
;
2024
Abstract
We exploit a multi-step methodology, called ROBOUT, to recover outliers in a dependent variable conditional on its predictors, to be identified from a large number of potential predictors. ROBOUT entails a preliminary imputation procedure to identify potential leverage outliers, a robust variable selection (via LASSO-penalized Huber loss regression), a robust regression (via MM method) and an outlier detection step. We show in a simulation study that ROBOUT is the most effective methodology for what concerns outlier detection, predictor recovery, and coefficient estimation, even compared to existing integrated procedures like SPARSE-LTS and RLARS. We then apply ROBOUT to a real granular dataset on euro area banks, with the aim to predict log-cash. We identify the most relevant granular indicators useful in predicting the logarithm of cash holdings, like financial guarantees to households or debt securities and term loans to non-financial corporations, and we simultaneously recover the two banks with wrongly reported log-cash values, as well as a set of banks with disproportionately high values of log-cash with respect to the identified predictor values.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.