Down syndrome (DS) or trisomy 21 is the most common genetic cause of intellectual disability (ID), but a pathogenic mechanism has not been identified yet. Studying a complex and not monogenic condition such as DS, a clear correlation between cause and effect might be difficult to find through classical analysis methods, thus different approaches need to be used. The increased availability of big data has made the use of artificial intelligence (AI) and in particular machine learning (ML) in the medical field possible. The purpose of this work is the application of ML techniques to provide an analysis of clinical records obtained from subjects with DS and study their association with ID. We have applied two tree-based ML models (random forest and gradient boosting machine) to the research question: how to identify key features likely associated with ID in DS. We analyzed 109 features (or variables) in 106 DS subjects. The outcome of the analysis was the age equivalent (AE) score as indicator of intellectual functioning, impaired in ID. We applied several methods to configure the models: feature selection through Boruta framework to minimize random correlation; data augmentation to overcome the issue of a small dataset; age effect mitigation to take into account the chronological age of the subjects. The results show that ML algorithms can be applied with good accuracy to identify variables likely involved in cognitive impairment in DS. In particular, we show how random forest and gradient boosting machine produce results with low error (MSE <0.12) and an acceptable R-2 (0.70 and 0.93). Interestingly, the ranking of the variables point to several features of interest related to hearing, gastrointestinal alterations, thyroid state, immune system and vitamin B12 that can be considered with particular attention for improving care pathways for people with DS. In conclusion, ML-based model may assist researchers in identifying key features likely correlated with ID in DS, and ultimately, may improve research efforts focused on the identification of possible therapeutic targets and new care pathways. We believe this study can be the basis for further testing/validating of our algorithms with multiple and larger datasets.

Machine learning based analysis for intellectual disability in Down syndrome / Baldo, Federico; Piovesan, Allison; Rakvin, Marijana; Ramacieri, Giuseppe; Locatelli, Chiara; Lanfranchi, Silvia; Onnivello, Sara; Pulina, Francesca; Caracausi, Maria; Antonaros, Francesca; Lombardi, Michele; Pelleri, Maria Chiara. - In: HELIYON. - ISSN 2405-8440. - ELETTRONICO. - 9:9(2023), pp. e19444.1-e19444.12. [10.1016/j.heliyon.2023.e19444]

Machine learning based analysis for intellectual disability in Down syndrome

Baldo, Federico
Co-primo
;
Piovesan, Allison
Co-primo
;
Ramacieri, Giuseppe;Caracausi, Maria;Antonaros, Francesca;Lombardi, Michele
Penultimo
;
Pelleri, Maria Chiara
Ultimo
2023

Abstract

Down syndrome (DS) or trisomy 21 is the most common genetic cause of intellectual disability (ID), but a pathogenic mechanism has not been identified yet. Studying a complex and not monogenic condition such as DS, a clear correlation between cause and effect might be difficult to find through classical analysis methods, thus different approaches need to be used. The increased availability of big data has made the use of artificial intelligence (AI) and in particular machine learning (ML) in the medical field possible. The purpose of this work is the application of ML techniques to provide an analysis of clinical records obtained from subjects with DS and study their association with ID. We have applied two tree-based ML models (random forest and gradient boosting machine) to the research question: how to identify key features likely associated with ID in DS. We analyzed 109 features (or variables) in 106 DS subjects. The outcome of the analysis was the age equivalent (AE) score as indicator of intellectual functioning, impaired in ID. We applied several methods to configure the models: feature selection through Boruta framework to minimize random correlation; data augmentation to overcome the issue of a small dataset; age effect mitigation to take into account the chronological age of the subjects. The results show that ML algorithms can be applied with good accuracy to identify variables likely involved in cognitive impairment in DS. In particular, we show how random forest and gradient boosting machine produce results with low error (MSE <0.12) and an acceptable R-2 (0.70 and 0.93). Interestingly, the ranking of the variables point to several features of interest related to hearing, gastrointestinal alterations, thyroid state, immune system and vitamin B12 that can be considered with particular attention for improving care pathways for people with DS. In conclusion, ML-based model may assist researchers in identifying key features likely correlated with ID in DS, and ultimately, may improve research efforts focused on the identification of possible therapeutic targets and new care pathways. We believe this study can be the basis for further testing/validating of our algorithms with multiple and larger datasets.
2023
Machine learning based analysis for intellectual disability in Down syndrome / Baldo, Federico; Piovesan, Allison; Rakvin, Marijana; Ramacieri, Giuseppe; Locatelli, Chiara; Lanfranchi, Silvia; Onnivello, Sara; Pulina, Francesca; Caracausi, Maria; Antonaros, Francesca; Lombardi, Michele; Pelleri, Maria Chiara. - In: HELIYON. - ISSN 2405-8440. - ELETTRONICO. - 9:9(2023), pp. e19444.1-e19444.12. [10.1016/j.heliyon.2023.e19444]
Baldo, Federico; Piovesan, Allison; Rakvin, Marijana; Ramacieri, Giuseppe; Locatelli, Chiara; Lanfranchi, Silvia; Onnivello, Sara; Pulina, Francesca; Caracausi, Maria; Antonaros, Francesca; Lombardi, Michele; Pelleri, Maria Chiara
File in questo prodotto:
File Dimensione Formato  
Baldo 2023 ML DS.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 1.59 MB
Formato Adobe PDF
1.59 MB Adobe PDF Visualizza/Apri
ScienceDirect_files_26Feb2024_14-48-17.874.zip

accesso aperto

Tipo: File Supplementare
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 964.47 kB
Formato Zip File
964.47 kB Zip File Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/956282
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact