: One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables-a signature-for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text]-[Formula: see text]). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models.
Curti, N., Levi, G., Giampieri, E., Castellani, G., Remondini, D. (2022). A network approach for low dimensional signatures from high throughput data. SCIENTIFIC REPORTS, 12(1), 1-9 [10.1038/s41598-022-25549-9].
A network approach for low dimensional signatures from high throughput data
Curti, Nico;Levi, Giuseppe;Giampieri, Enrico
;Castellani, Gastone;Remondini, Daniel
2022
Abstract
: One of the main objectives of high-throughput genomics studies is to obtain a low-dimensional set of observables-a signature-for sample classification purposes (diagnosis, prognosis, stratification). Biological data, such as gene or protein expression, are commonly characterized by an up/down regulation behavior, for which discriminant-based methods could perform with high accuracy and easy interpretability. To obtain the most out of these methods features selection is even more critical, but it is known to be a NP-hard problem, and thus most feature selection approaches focuses on one feature at the time (k-best, Sequential Feature Selection, recursive feature elimination). We propose DNetPRO, Discriminant Analysis with Network PROcessing, a supervised network-based signature identification method. This method implements a network-based heuristic to generate one or more signatures out of the best performing feature pairs. The algorithm is easily scalable, allowing efficient computing for high number of observables ([Formula: see text]-[Formula: see text]). We show applications on real high-throughput genomic datasets in which our method outperforms existing results, or is compatible with them but with a smaller number of selected features. Moreover, the geometrical simplicity of the resulting class-separation surfaces allows a clearer interpretation of the obtained signatures in comparison to nonlinear classification models.File | Dimensione | Formato | |
---|---|---|---|
s41598-022-25549-9.pdf
accesso aperto
Descrizione: pdf manuscript
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
2.81 MB
Formato
Adobe PDF
|
2.81 MB | Adobe PDF | Visualizza/Apri |
41598_2022_25549_MOESM1_ESM.pdf
accesso aperto
Tipo:
File Supplementare
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
4.02 MB
Formato
Adobe PDF
|
4.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.