Given a novel protein it is very important to know if it is a DNA-binding protein, since DNA-binding proteins participate in a fundamental role in the regulation of gene expression. In this work, we propose a parallel fusion between a classifier trained using the features extracted from the gene ontology database and a classifier trained using the dipeptide composition of the protein. As classifiers the support vector machine (SVM) and the 1-nearest neighbour are used. The Matthews’s correlation coefficient obtained by our fusion method is ≈0.97 when the jackknife cross-validation is used, this result outperforms the best performance obtained in the literature (0.924) using the same dataset where the SVM is trained using only the Chou’s pseudo amino acid based features. In this work also the area under the ROC-curve (AUC) is reported and our results show that the fusion permits to obtain a very interesting 0.995 AUC. In particular we want to stress that our fusion obtains a 5% false negative with a 0% of false positive. The Matthews’s correlation coefficient obtained using the single best GO-number is only 0.7211, hence it is not possible to use the gene ontology database as a simple lookup table. Finally, we test the complementarity of the two tested feature extraction methods using the Q-statistic. We obtain the very interesting result of 0.58, this mean that the features extracted from the gene ontology database and the features extracted from the amino acid sequence are partially independent and that their parallel fusion should be more studied.

Combing Ontologies and Dipeptide composition for predicting DNA-binding proteins / L. Nanni; A. Lumini. - In: AMINO ACIDS. - ISSN 0939-4451. - STAMPA. - 34:(2008), pp. 635-641. [10.1007/s00726-007-0016-3]

Combing Ontologies and Dipeptide composition for predicting DNA-binding proteins

NANNI, LORIS;LUMINI, ALESSANDRA
2008

Abstract

Given a novel protein it is very important to know if it is a DNA-binding protein, since DNA-binding proteins participate in a fundamental role in the regulation of gene expression. In this work, we propose a parallel fusion between a classifier trained using the features extracted from the gene ontology database and a classifier trained using the dipeptide composition of the protein. As classifiers the support vector machine (SVM) and the 1-nearest neighbour are used. The Matthews’s correlation coefficient obtained by our fusion method is ≈0.97 when the jackknife cross-validation is used, this result outperforms the best performance obtained in the literature (0.924) using the same dataset where the SVM is trained using only the Chou’s pseudo amino acid based features. In this work also the area under the ROC-curve (AUC) is reported and our results show that the fusion permits to obtain a very interesting 0.995 AUC. In particular we want to stress that our fusion obtains a 5% false negative with a 0% of false positive. The Matthews’s correlation coefficient obtained using the single best GO-number is only 0.7211, hence it is not possible to use the gene ontology database as a simple lookup table. Finally, we test the complementarity of the two tested feature extraction methods using the Q-statistic. We obtain the very interesting result of 0.58, this mean that the features extracted from the gene ontology database and the features extracted from the amino acid sequence are partially independent and that their parallel fusion should be more studied.
2008
Combing Ontologies and Dipeptide composition for predicting DNA-binding proteins / L. Nanni; A. Lumini. - In: AMINO ACIDS. - ISSN 0939-4451. - STAMPA. - 34:(2008), pp. 635-641. [10.1007/s00726-007-0016-3]
L. Nanni; A. Lumini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/63183
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 27
social impact