Current research in machine learning and artificial intelligence is largely centered on modeling and performance evaluation, less so on data collection. However, recent research demonstrated that limitations and biases in data may negatively impact trustworthiness and reliability. These aspects are particularly impactful on sensitive domains such as mental health and neurological disorders, where speech data are used to develop AI applications for patients and healthcare providers. In this paper, we chart the landscape of available speech datasets for this domain, to highlight possible pitfalls and opportunities for improvement and promote fairness and diversity. We present a comprehensive list of desiderata for building speech datasets for mental health and neurological disorders and distill it into an actionable checklist focused on ethical concerns to foster more responsible research.

Mancini, E., Tanevska, A., Galassi, A., Galatolo, A., Ruggeri, F., Torroni, P. (2025). Promoting the Responsible Development of Speech Datasets for Mental Health and Neurological Disorders Research. THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 82, 937-972 [10.1613/jair.1.16406].

Promoting the Responsible Development of Speech Datasets for Mental Health and Neurological Disorders Research

Mancini, Eleonora;Galassi, Andrea
;
Ruggeri, Federico;Torroni, Paolo
2025

Abstract

Current research in machine learning and artificial intelligence is largely centered on modeling and performance evaluation, less so on data collection. However, recent research demonstrated that limitations and biases in data may negatively impact trustworthiness and reliability. These aspects are particularly impactful on sensitive domains such as mental health and neurological disorders, where speech data are used to develop AI applications for patients and healthcare providers. In this paper, we chart the landscape of available speech datasets for this domain, to highlight possible pitfalls and opportunities for improvement and promote fairness and diversity. We present a comprehensive list of desiderata for building speech datasets for mental health and neurological disorders and distill it into an actionable checklist focused on ethical concerns to foster more responsible research.
2025
Mancini, E., Tanevska, A., Galassi, A., Galatolo, A., Ruggeri, F., Torroni, P. (2025). Promoting the Responsible Development of Speech Datasets for Mental Health and Neurological Disorders Research. THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 82, 937-972 [10.1613/jair.1.16406].
Mancini, Eleonora; Tanevska, Ana; Galassi, Andrea; Galatolo, Alessio; Ruggeri, Federico; Torroni, Paolo
File in questo prodotto:
File Dimensione Formato  
JAIR Promoting.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 352.09 kB
Formato Adobe PDF
352.09 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1005631
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact