Random Projections (RP) ensemble classifiers allow to improve classification accuracy while extending to the high-dimensional context methods originally developed for low dimensional data. However, reducing {em redundancy} and understanding the properties of the variable ranking induced by the RP ensemble classifier are still open issues. In fact, despite such classifiers highly improve the classification accuracy, they do not allow the identification of the variables with the highest discriminative power and their performance could still be enhanced by a suitable selection of a good subset of them. With the aim to identify both the most accurate subset of classifiers and the most discriminant input features, in this work we investigated two different directions. On one hand, combining the original idea of using the Multiplicative Binomial Distribution (MBD) as the reference model to describe and predict the ensemble accuracy and an important result on such distribution, we devised a simple forward-selection technique called Ensemble Selection Algorithm (ESA). On the other, inspired by the Random Forest (RF) process for feature selection, we adjusted the RP ensemble classifier so as to keep the information on variable importance. Specifically, we measured the relative importance of each input feature through a specific coefficient, called Variable Importance in Projection (VIP), and then we removed the variables that present the smallest values of such coefficient. Results of applying both the ESA and the VIP criterion in simulated and real data demonstrate that our proposal successfully controls the misclassification rate by using a very small number of individual classifiers and by ranking the features in terms of their discriminative power.
Laura Anderlucci, Angela Montanari, Francesca Fortunato (2018). Ensemble Classification with Random Projections: classifier selection and variable importance.
Ensemble Classification with Random Projections: classifier selection and variable importance
Laura Anderlucci
;Angela Montanari
;Francesca Fortunato
2018
Abstract
Random Projections (RP) ensemble classifiers allow to improve classification accuracy while extending to the high-dimensional context methods originally developed for low dimensional data. However, reducing {em redundancy} and understanding the properties of the variable ranking induced by the RP ensemble classifier are still open issues. In fact, despite such classifiers highly improve the classification accuracy, they do not allow the identification of the variables with the highest discriminative power and their performance could still be enhanced by a suitable selection of a good subset of them. With the aim to identify both the most accurate subset of classifiers and the most discriminant input features, in this work we investigated two different directions. On one hand, combining the original idea of using the Multiplicative Binomial Distribution (MBD) as the reference model to describe and predict the ensemble accuracy and an important result on such distribution, we devised a simple forward-selection technique called Ensemble Selection Algorithm (ESA). On the other, inspired by the Random Forest (RF) process for feature selection, we adjusted the RP ensemble classifier so as to keep the information on variable importance. Specifically, we measured the relative importance of each input feature through a specific coefficient, called Variable Importance in Projection (VIP), and then we removed the variables that present the smallest values of such coefficient. Results of applying both the ESA and the VIP criterion in simulated and real data demonstrate that our proposal successfully controls the misclassification rate by using a very small number of individual classifiers and by ranking the features in terms of their discriminative power.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.