The microarrays report the measures of the expression levels of tens of thousands of genes, this high dimensional feature vector contains also irrelevant information for accurate classification. Moreover, only few training samples are available, hence for avoiding the curse of dimensionality problem a feature reduction should be performed before the classification step. Here, we proposed a set of orthogonal wavelet detail coefficients of different wavelet mothers to extract the features from the microarray data. We propose to use a multi-classifiers where each classifier, a support vector machine, is trained using a different set of detail coefficients, the classifiers are combined by sum rule. The detail coefficients set selection is performed by running Sequential Forward Floating Selection (SFFS). The goodness of the proposed method is validated using the area under the ROC curve as performance indicator, the experiments are carried out on four datasets: Breast dataset; Ovarian dataset; Lung dataset; Prostate dataset. The results show that the proposed method outperforms the performance that can be obtained by a single set of detail coefficients. Moreover, we have shown that, also using as features the detail coefficients, a random subspace of classifiers outperforms the stand-alone classifiers.

Wavelet selection for disease classification by DNA microarray data

NANNI, LORIS;LUMINI, ALESSANDRA
2011

Abstract

The microarrays report the measures of the expression levels of tens of thousands of genes, this high dimensional feature vector contains also irrelevant information for accurate classification. Moreover, only few training samples are available, hence for avoiding the curse of dimensionality problem a feature reduction should be performed before the classification step. Here, we proposed a set of orthogonal wavelet detail coefficients of different wavelet mothers to extract the features from the microarray data. We propose to use a multi-classifiers where each classifier, a support vector machine, is trained using a different set of detail coefficients, the classifiers are combined by sum rule. The detail coefficients set selection is performed by running Sequential Forward Floating Selection (SFFS). The goodness of the proposed method is validated using the area under the ROC curve as performance indicator, the experiments are carried out on four datasets: Breast dataset; Ovarian dataset; Lung dataset; Prostate dataset. The results show that the proposed method outperforms the performance that can be obtained by a single set of detail coefficients. Moreover, we have shown that, also using as features the detail coefficients, a random subspace of classifiers outperforms the stand-alone classifiers.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/96820
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 16
social impact