Many domains have a stake in the development of reliable systems for automatic protein classification. Of particular interest in recent studies of automatic protein classification is the exploration of new methods for extracting features from a protein that enhance classification for specific problems. These methods have proven very useful in one or two domains, but they have failed to generalize well across several domains (i.e. classification problems). In this paper we evaluate several feature extraction approaches for representing proteins with the aim of sequence-based protein classification. Several protein representation are evaluated, those starting from: the position specific scoring matrix (PSSM) of the proteins; the amino-acid sequence; a matrix representation of the protein, of dimension (length of the protein)×20, obtained using the substitution matrices for representing each amino-acid as a vector. A valuable result is that a texture descriptor can be extracted from the PSSM protein representation which improve the performance of standard descriptors based on the PSSM representation. Experimentally we develop our systems by comparing several protein descriptors on nine different datasets. Each descriptor is used to train a support vector machine (SVM) or an ensemble of SVM. Although different stand-alone descriptors work well on some datasets (but not on others), we have discovered that fusion among classifiers trained using different descriptors obtains a good performance across all the tested datasets. Matlab code/Datasets used in the proposed paper is available at bias.csr.unibo.it\nanni\PSSM.rar.

An empirical study on the matrix-based protein representations and their combination with sequence-based approaches / Loris Nanni;Alessandra Lumini;Sheryl Brahnam. - In: AMINO ACIDS. - ISSN 0939-4451. - STAMPA. - 44:4(2013), pp. 887-901. [10.1007/s00726-012-1416-6]

An empirical study on the matrix-based protein representations and their combination with sequence-based approaches

LUMINI, ALESSANDRA;
2013

Abstract

Many domains have a stake in the development of reliable systems for automatic protein classification. Of particular interest in recent studies of automatic protein classification is the exploration of new methods for extracting features from a protein that enhance classification for specific problems. These methods have proven very useful in one or two domains, but they have failed to generalize well across several domains (i.e. classification problems). In this paper we evaluate several feature extraction approaches for representing proteins with the aim of sequence-based protein classification. Several protein representation are evaluated, those starting from: the position specific scoring matrix (PSSM) of the proteins; the amino-acid sequence; a matrix representation of the protein, of dimension (length of the protein)×20, obtained using the substitution matrices for representing each amino-acid as a vector. A valuable result is that a texture descriptor can be extracted from the PSSM protein representation which improve the performance of standard descriptors based on the PSSM representation. Experimentally we develop our systems by comparing several protein descriptors on nine different datasets. Each descriptor is used to train a support vector machine (SVM) or an ensemble of SVM. Although different stand-alone descriptors work well on some datasets (but not on others), we have discovered that fusion among classifiers trained using different descriptors obtains a good performance across all the tested datasets. Matlab code/Datasets used in the proposed paper is available at bias.csr.unibo.it\nanni\PSSM.rar.
2013
An empirical study on the matrix-based protein representations and their combination with sequence-based approaches / Loris Nanni;Alessandra Lumini;Sheryl Brahnam. - In: AMINO ACIDS. - ISSN 0939-4451. - STAMPA. - 44:4(2013), pp. 887-901. [10.1007/s00726-012-1416-6]
Loris Nanni;Alessandra Lumini;Sheryl Brahnam
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/142733
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 18
social impact