During the last decade there has been a tremendous growth in the amount of protein data. Machine Learning, which is concerned with the automatic acquisition of models from data, as well as with the usage of such models for automatic inference and prediction, can be very useful in the interpretation of protein data. Machine learning is a subset of the pattern recognition techniques where the parameters of a given approach are obtained analyzing a given dataset. In this era many efforts are needed to develop a reliable system for classifying proteins and to this aim several methods are being developed for extracting features from a protein and for classifying them. Unfortunately almost all these methods have been tested only on one problem, while comparing different papers it is clear that in different problems the best results are obtained by different methods. The aim of this work is to find a method, or an ensemble of methods, that works well in different problems. In this work we study several feature extraction approaches for representing proteins, to be combined evaluated and compared on three datasets: human gastric bacterium helicobacter pylori protein-protein interaction problem; human dataset protein-protein interaction problem; submitochondria localization of a given mitochondrial protein. A number of statistically robust observations are obtained regarding the effectiveness of the proposed system.

An empirical study for finding a robust ensemble of classifiers for protein classification / Nanni, Loris; Lumini, Alessandra. - STAMPA. - (2011), pp. 53-72.

An empirical study for finding a robust ensemble of classifiers for protein classification

NANNI, LORIS;LUMINI, ALESSANDRA
2011

Abstract

During the last decade there has been a tremendous growth in the amount of protein data. Machine Learning, which is concerned with the automatic acquisition of models from data, as well as with the usage of such models for automatic inference and prediction, can be very useful in the interpretation of protein data. Machine learning is a subset of the pattern recognition techniques where the parameters of a given approach are obtained analyzing a given dataset. In this era many efforts are needed to develop a reliable system for classifying proteins and to this aim several methods are being developed for extracting features from a protein and for classifying them. Unfortunately almost all these methods have been tested only on one problem, while comparing different papers it is clear that in different problems the best results are obtained by different methods. The aim of this work is to find a method, or an ensemble of methods, that works well in different problems. In this work we study several feature extraction approaches for representing proteins, to be combined evaluated and compared on three datasets: human gastric bacterium helicobacter pylori protein-protein interaction problem; human dataset protein-protein interaction problem; submitochondria localization of a given mitochondrial protein. A number of statistically robust observations are obtained regarding the effectiveness of the proposed system.
2011
Protein Engineering: Design, Selection and Applications
53
72
An empirical study for finding a robust ensemble of classifiers for protein classification / Nanni, Loris; Lumini, Alessandra. - STAMPA. - (2011), pp. 53-72.
Nanni, Loris; Lumini, Alessandra
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/96982
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact