This paper studies the problem of building a machine learning method for biological data. Various feature selection methods and classifier design strategies have been generally used and compared. However, most published articles have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. We propose an ensemble of classifiers that combine a linear classifier, linear support vector machine, a non-linear classifier, radial basis-support vector machines and a Subspace Classifier. We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. On a wide range of recently published datasets, our method performs better, or is at least comparable to, the current best methods of our knowledge.
L. Nanni, A. Lumini (2007). Ensemblator: an ensemble of classifiers for reliable classification of Biological Data. PATTERN RECOGNITION LETTERS, 28, 622-630 [10.1016/j.patrec.2006.10.012].
Ensemblator: an ensemble of classifiers for reliable classification of Biological Data
NANNI, LORIS;LUMINI, ALESSANDRA
2007
Abstract
This paper studies the problem of building a machine learning method for biological data. Various feature selection methods and classifier design strategies have been generally used and compared. However, most published articles have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. We propose an ensemble of classifiers that combine a linear classifier, linear support vector machine, a non-linear classifier, radial basis-support vector machines and a Subspace Classifier. We validate our new method on several recent publicly available datasets both with predictive accuracy of testing samples and through cross validation. Compared with the best performance of other current methods, remarkably improved results can be obtained using our new strategy on a wide range of different datasets. On a wide range of recently published datasets, our method performs better, or is at least comparable to, the current best methods of our knowledge.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.