Objective In this paper we propose a new feature extractor for peptide/protein classification based on the calculation of texture descriptors. Representing a peptide/protein using a matrix descriptor, instead of a vector, allows to deal with the peptide/protein as an image and to use texture descriptors for representation purposes. Methods and materials A matrix descriptor, which is a squared matrix of the dimension of the peptide/protein, is obtained considering a partial ordering of the amino-acids of the peptide/protein according to their value of a given physicochemical property. Each matrix descriptor is considered as a texture image and several texture descriptors are considered to obtain a compact representation which is scale invariant (i.e. independent on the length of the peptideprotein). The texture descriptors tested in this work are: local binary patterns, discrete cosine transform and Daubechies wavelets. Results and conclusion The experimental section reports several tests, aimed at supporting our ideas, performed on the following datasets: vaccine dataset for the predictions of peptides that bind human leukocyte antigens; human immunodeficiency virus (HIV-1) protease cleavage site prediction dataset and membrane proteins type dataset. The experimental results confirm the usefulness of the novel descriptors: the performance obtained by our system on the three difficult datasets is quite high, indicating that the proposed method is a feasible system for extracting information from peptides and proteins. The performance obtained by each of the three texture descriptors calculated from the matrix-based representation, and coupled to a support vector machine classifier, is lower than the performance obtained by other vector-based descriptors based on physicochemical properties proposed in the literature. Anyway the new descriptors bring different information and our tests show that the texture descriptors and the vector-based descriptors can be combined to improve the overall performance of the system. In particular the proposed approach improves the state-of-the-art results in two out of three tested problems (HIV-1 protease cleavage site prediction dataset and membrane proteins type dataset).
L. Nanni, A. Lumini (2010). Coding of amino acids by texture descriptors. ARTIFICIAL INTELLIGENCE IN MEDICINE, 48, 43-50 [10.1016/j.artmed.2009.10.001].
Coding of amino acids by texture descriptors
NANNI, LORIS;LUMINI, ALESSANDRA
2010
Abstract
Objective In this paper we propose a new feature extractor for peptide/protein classification based on the calculation of texture descriptors. Representing a peptide/protein using a matrix descriptor, instead of a vector, allows to deal with the peptide/protein as an image and to use texture descriptors for representation purposes. Methods and materials A matrix descriptor, which is a squared matrix of the dimension of the peptide/protein, is obtained considering a partial ordering of the amino-acids of the peptide/protein according to their value of a given physicochemical property. Each matrix descriptor is considered as a texture image and several texture descriptors are considered to obtain a compact representation which is scale invariant (i.e. independent on the length of the peptideprotein). The texture descriptors tested in this work are: local binary patterns, discrete cosine transform and Daubechies wavelets. Results and conclusion The experimental section reports several tests, aimed at supporting our ideas, performed on the following datasets: vaccine dataset for the predictions of peptides that bind human leukocyte antigens; human immunodeficiency virus (HIV-1) protease cleavage site prediction dataset and membrane proteins type dataset. The experimental results confirm the usefulness of the novel descriptors: the performance obtained by our system on the three difficult datasets is quite high, indicating that the proposed method is a feasible system for extracting information from peptides and proteins. The performance obtained by each of the three texture descriptors calculated from the matrix-based representation, and coupled to a support vector machine classifier, is lower than the performance obtained by other vector-based descriptors based on physicochemical properties proposed in the literature. Anyway the new descriptors bring different information and our tests show that the texture descriptors and the vector-based descriptors can be combined to improve the overall performance of the system. In particular the proposed approach improves the state-of-the-art results in two out of three tested problems (HIV-1 protease cleavage site prediction dataset and membrane proteins type dataset).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.