During the last decade there has been a tremendous growth in the amount of protein data. Machine Learning, which is concerned with the automatic acquisition of models from data, as well as with the usage of such models for automatic inference and prediction, can be very useful in the interpretation of protein data. Machine learning is a subset of the pattern recognition techniques where the parameters of a given approach are obtained analyzing a given dataset. In this era many efforts are needed to develop a reliable system for classifying proteins and to this aim several methods are being developed for extracting features from a protein and for classifying them. Unfortunately almost all these methods have been tested only on one problem, while comparing different papers it is clear that in different problems the best results are obtained by different methods. The aim of this work is to find a method, or an ensemble of methods, that works well in different problems. In this work we study several feature extraction approaches for representing proteins, to be combined evaluated and compared on three datasets: human gastric bacterium helicobacter pylori protein-protein interaction problem; human dataset protein-protein interaction problem; submitochondria localization of a given mitochondrial protein. A number of statistically robust observations are obtained regarding the effectiveness of the proposed system.
Nanni, L., Lumini, A. (2011). An empirical study for finding a robust ensemble of classifiers for protein classification. HAUPPAUGE, NY : Nova Publishers.
An empirical study for finding a robust ensemble of classifiers for protein classification
NANNI, LORIS;LUMINI, ALESSANDRA
2011
Abstract
During the last decade there has been a tremendous growth in the amount of protein data. Machine Learning, which is concerned with the automatic acquisition of models from data, as well as with the usage of such models for automatic inference and prediction, can be very useful in the interpretation of protein data. Machine learning is a subset of the pattern recognition techniques where the parameters of a given approach are obtained analyzing a given dataset. In this era many efforts are needed to develop a reliable system for classifying proteins and to this aim several methods are being developed for extracting features from a protein and for classifying them. Unfortunately almost all these methods have been tested only on one problem, while comparing different papers it is clear that in different problems the best results are obtained by different methods. The aim of this work is to find a method, or an ensemble of methods, that works well in different problems. In this work we study several feature extraction approaches for representing proteins, to be combined evaluated and compared on three datasets: human gastric bacterium helicobacter pylori protein-protein interaction problem; human dataset protein-protein interaction problem; submitochondria localization of a given mitochondrial protein. A number of statistically robust observations are obtained regarding the effectiveness of the proposed system.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.