It is important to develop a reliable system for predicting bacterial virulent proteins for finding novel drug/vaccine and for understanding virulence mechanisms in pathogens. In this work we have proposed a bacterial virulent protein prediction method based on an ensemble of classifiers where the features are extracted directly from the amino aid sequence of a given protein. It is well known in the literature that the features extracted from the evolutionary information of a given protein are better than the features extracted from the amino-acid sequence. Our method tries to fill the gap between the amino-acid sequence-based approaches and the evolutionary information-based approaches. An extensive evaluation according to a blind testing protocol, where the parameters of the system are calculated using the training set and the system is validated in three different independent datasets, has demonstrated the validity of the proposed method.
Nanni, L., Lumini, A. (2009). An ensemble of Support Vector Machines for predicting virulent proteins. EXPERT SYSTEMS WITH APPLICATIONS, 36, 7458-7462 [10.1016/j.eswa.2008.09.036].
An ensemble of Support Vector Machines for predicting virulent proteins
NANNI, LORIS;LUMINI, ALESSANDRA
2009
Abstract
It is important to develop a reliable system for predicting bacterial virulent proteins for finding novel drug/vaccine and for understanding virulence mechanisms in pathogens. In this work we have proposed a bacterial virulent protein prediction method based on an ensemble of classifiers where the features are extracted directly from the amino aid sequence of a given protein. It is well known in the literature that the features extracted from the evolutionary information of a given protein are better than the features extracted from the amino-acid sequence. Our method tries to fill the gap between the amino-acid sequence-based approaches and the evolutionary information-based approaches. An extensive evaluation according to a blind testing protocol, where the parameters of the system are calculated using the training set and the system is validated in three different independent datasets, has demonstrated the validity of the proposed method.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.