It is well known in the literature that an ensemble of classifiers obtains good performance respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers the linear support vector machine and the radial basis function support vector machine are tested. As performance indicator the accuracy and the Matthews’s correlation coefficient are reported. The Matthews’s correlation coefficient obtained by our ensemble method is ≈0.97 when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.

Lumini, A., Nanni, L. (2009). An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins. AMINO ACIDS, 36, 167-175 [10.1007/s00726-008-0044-7].

An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins

LUMINI, ALESSANDRA;NANNI, LORIS
2009

Abstract

It is well known in the literature that an ensemble of classifiers obtains good performance respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers the linear support vector machine and the radial basis function support vector machine are tested. As performance indicator the accuracy and the Matthews’s correlation coefficient are reported. The Matthews’s correlation coefficient obtained by our ensemble method is ≈0.97 when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence.
2009
Lumini, A., Nanni, L. (2009). An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins. AMINO ACIDS, 36, 167-175 [10.1007/s00726-008-0044-7].
Lumini, Alessandra; Nanni, Loris
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/73476
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 23
social impact