Almost 50 million people are living with dementia in 2018 worldwide, and the number will double every 20 years. The effectiveness of existing pharmacologic treatments for the disease is limited to symptoms control, and none of them are able to prevent, reverse or turn off the neurodegenerative process that leads to dementia; therefore, a prompt detection of the “disease signature” is a key problem, in order to develop and test new drugs and to support the management of clinical and domestic context. Recent studies showed that linguistic alterations may be one of the earliest signs of the pathology, years before other neurocognitive deficits become evident. Traditional tests fail to identify these slight but noticeable changes; whereas, the analysis of spoken language productions by Natural Language Processing (NLP) techniques can ecologically and inexpensively identify minor language modifications in potential patients. This interdisciplinary study aims at quantifying and describing alterations of linguistic features due to cognitive decline and build an automatic system for early diagnosis and screening purpose. To this aim, we enrolled 96 participants: 48 healthy controls and 48 impaired subjects. Of the latter, 32 was diagnosed with Mild Cognitive Impairment and 16 with early Dementia (eD). Each subject underwent a brief neuropsychological screening, and samples of semi-spontaneous speech productions was collected by means of three elicitation tasks. Recorded sessions were orthographically transcribed, PoS tagged and parsed building two different corpora: in the first we kept the automatic annotations, while in the second the transcripts were manually corrected in order to remove all mistakes. A multidimensional parameter computation was performed on the data, taking into consideration a set of 87 acoustical, rhythmical, morpho-syntactic and lexical feature as well as some readability indexes and demographic information. After these preparatory steps, some automatic classifiers were trained to distinguish healthy controls from MCI subjects employing two different algorithms, Support Vector (SVC) and Random Forest Classifiers (RFC). Our system was able to distinguish between controls and MCI subjects exhibiting high F1 scores, around 75%, thus it seems to be a promising approach for the identification of preclinical stages of dementia.

Linguistic features and automatic classifiers for identifying mild cognitive impairment and dementia

Calzà Laura;Gagliardi Gloria;Rossini Favretti Rema;Tamburini Fabio
2021

Abstract

Almost 50 million people are living with dementia in 2018 worldwide, and the number will double every 20 years. The effectiveness of existing pharmacologic treatments for the disease is limited to symptoms control, and none of them are able to prevent, reverse or turn off the neurodegenerative process that leads to dementia; therefore, a prompt detection of the “disease signature” is a key problem, in order to develop and test new drugs and to support the management of clinical and domestic context. Recent studies showed that linguistic alterations may be one of the earliest signs of the pathology, years before other neurocognitive deficits become evident. Traditional tests fail to identify these slight but noticeable changes; whereas, the analysis of spoken language productions by Natural Language Processing (NLP) techniques can ecologically and inexpensively identify minor language modifications in potential patients. This interdisciplinary study aims at quantifying and describing alterations of linguistic features due to cognitive decline and build an automatic system for early diagnosis and screening purpose. To this aim, we enrolled 96 participants: 48 healthy controls and 48 impaired subjects. Of the latter, 32 was diagnosed with Mild Cognitive Impairment and 16 with early Dementia (eD). Each subject underwent a brief neuropsychological screening, and samples of semi-spontaneous speech productions was collected by means of three elicitation tasks. Recorded sessions were orthographically transcribed, PoS tagged and parsed building two different corpora: in the first we kept the automatic annotations, while in the second the transcripts were manually corrected in order to remove all mistakes. A multidimensional parameter computation was performed on the data, taking into consideration a set of 87 acoustical, rhythmical, morpho-syntactic and lexical feature as well as some readability indexes and demographic information. After these preparatory steps, some automatic classifiers were trained to distinguish healthy controls from MCI subjects employing two different algorithms, Support Vector (SVC) and Random Forest Classifiers (RFC). Our system was able to distinguish between controls and MCI subjects exhibiting high F1 scores, around 75%, thus it seems to be a promising approach for the identification of preclinical stages of dementia.
2021
Calzà Laura, Gagliardi Gloria, Rossini Favretti Rema, Tamburini Fabio
File in questo prodotto:
File Dimensione Formato  
linguistic features.pdf

accesso riservato

Descrizione: articolo
Tipo: Versione (PDF) editoriale
Licenza: Licenza per accesso riservato
Dimensione 1.38 MB
Formato Adobe PDF
1.38 MB Adobe PDF   Visualizza/Apri   Contatta l'autore
Tamburini_CSL2021.pdf

accesso aperto

Descrizione: articolo
Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/806534
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 44
  • ???jsp.display-item.citation.isi??? 30
social impact