Machine Learning for HIV-1 Protease Cleavage Site Prediction

Lumini, Alessandra; Nanni, Loris

doi:10.1016/j.patrec.2006.01.014

Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods, known as ensemble methods, from the field of machine learning. However, it is still difficult for researchers to choose the best method due to the lack of an effective comparison. For the first time we have made an extensive study on methods for feature extraction, feature transformation and multiclassifier systems (MCS) in the problem of HIV-1 protease. In this work we report an experimental comparison on several learning systems coupled with different feature representations. We confirm previous results stating that linear classifiers obtain higher performance than non-linear classifiers using orthonormal encoding, but we also show that using Karhunen–Loeve transform the performance of neural networks are comparable to one of linear support vector machines. Finally we propose a new hierarchical approach that, for the first time, combines ideas derived from the machine learning methodologies and from a knowledge base of this particular problem. This approach proves to be a successful attempt to obtain a drastically error reduction with respect to the performance of linear classifiers: the error rate decreases from 9.1% using linear-SVM to 6.6% using our new hierarchical classifier based on some pattern rules.

Lumini, A., Nanni, L. (2006). Machine Learning for HIV-1 Protease Cleavage Site Prediction. PATTERN RECOGNITION LETTERS, 27, 1537-1544 [10.1016/j.patrec.2006.01.014].