The main two drawbacks of nearest neighbor based classifiers are: high CPU costs when the number of samples in the training set is high and performance extremely sensitive to outliers. Several attempts of overcoming such drawbacks have been proposed in the pattern recognition field aimed at selecting/generating an adequate subset of prototypes from the training set. The problem addressed in this paper concerns the comparison of methods for prototype reduction; several methods for finding a good set of prototypes are evaluated: particle swarm optimization; clustering algorithm; genetic algorithm; learning prototypes and distances. Experiments are carried out on several classification problems in order to evaluate the considered approaches in conjunction with different nearest neighbor based classifiers: 1-nearest-neighbor classifier, 5-nearest-neighbor classifier, nearest feature plane based classifier, nearest feature line based classifier. Moreover, we propose a method for creating an ensemble of the classifiers, where each classifier is trained with a different reduced set of prototypes. Since these prototypes are generated using a supervised optimization function, we have called our ensemble: "supervised bagging"’. The training phase consists in repeating N times the prototype generation, then the scores resulting from classifying a test pattern using each set of prototypes are combined by the ‘‘vote rule’’. The reported results show the superiority of this method with respect to the well known bagging approach for building ensembles of classifiers. Our results are obtained when 1-nearest-neighbor classifier is coupled with a ‘‘supervised’’ bagging ensemble of learning prototypes and distances. As expected, the approaches for prototype reduction proposed for 1-nearest-neighbor classifier do not work so well when other classifiers are tested. In our experiments the best method for prototype reduction when different classifiers are used is the genetic algorithm.
L. Nanni, A. Lumini (2011). Prototype reduction techniques: A comparison among different approaches. EXPERT SYSTEMS WITH APPLICATIONS, 38(9), 11820-11828 [10.1016/j.eswa.2011.03.070].
Prototype reduction techniques: A comparison among different approaches
LUMINI, ALESSANDRA
2011
Abstract
The main two drawbacks of nearest neighbor based classifiers are: high CPU costs when the number of samples in the training set is high and performance extremely sensitive to outliers. Several attempts of overcoming such drawbacks have been proposed in the pattern recognition field aimed at selecting/generating an adequate subset of prototypes from the training set. The problem addressed in this paper concerns the comparison of methods for prototype reduction; several methods for finding a good set of prototypes are evaluated: particle swarm optimization; clustering algorithm; genetic algorithm; learning prototypes and distances. Experiments are carried out on several classification problems in order to evaluate the considered approaches in conjunction with different nearest neighbor based classifiers: 1-nearest-neighbor classifier, 5-nearest-neighbor classifier, nearest feature plane based classifier, nearest feature line based classifier. Moreover, we propose a method for creating an ensemble of the classifiers, where each classifier is trained with a different reduced set of prototypes. Since these prototypes are generated using a supervised optimization function, we have called our ensemble: "supervised bagging"’. The training phase consists in repeating N times the prototype generation, then the scores resulting from classifying a test pattern using each set of prototypes are combined by the ‘‘vote rule’’. The reported results show the superiority of this method with respect to the well known bagging approach for building ensembles of classifiers. Our results are obtained when 1-nearest-neighbor classifier is coupled with a ‘‘supervised’’ bagging ensemble of learning prototypes and distances. As expected, the approaches for prototype reduction proposed for 1-nearest-neighbor classifier do not work so well when other classifiers are tested. In our experiments the best method for prototype reduction when different classifiers are used is the genetic algorithm.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.