The use of knowledge graphs (KGs) in advanced applications is constantly growing, as a consequence of their ability to model large collections of semantically interconnected data. The extraction of relational facts from plain text is currently one of the main approaches for the construction and expansion of KGs. In this paper, we introduce a novel unsupervised and automatic technique of KG learning from corpora of short unstructured and unlabeled texts. Our approach is unique in that it starts from raw textual data and comes to: i) identify a set of relevant domain-dependent terms; ii) extract aggregate and statistically significant semantic relationships between terms, documents and classes; iii) represent the accurate probabilistic knowledge as a KG; iv) extend and integrate the KG according to the Linked Open Data vision. The proposed solution is easily transferable to many domains and languages as long as the data are available. As a case study, we demonstrate how it is possible to automatically learn a KG representing the knowledge contained within the conversational messages shared on social networks such as Facebook by patients with rare diseases, and the impact this can have on creating resources aimed to capture the “voice of patients”.

Unsupervised Descriptive Text Mining for Knowledge Graph Learning

Giacomo Frisoni;Gianluca Moro
;
Antonella Carbonaro
2020

Abstract

The use of knowledge graphs (KGs) in advanced applications is constantly growing, as a consequence of their ability to model large collections of semantically interconnected data. The extraction of relational facts from plain text is currently one of the main approaches for the construction and expansion of KGs. In this paper, we introduce a novel unsupervised and automatic technique of KG learning from corpora of short unstructured and unlabeled texts. Our approach is unique in that it starts from raw textual data and comes to: i) identify a set of relevant domain-dependent terms; ii) extract aggregate and statistically significant semantic relationships between terms, documents and classes; iii) represent the accurate probabilistic knowledge as a KG; iv) extend and integrate the KG according to the Linked Open Data vision. The proposed solution is easily transferable to many domains and languages as long as the data are available. As a case study, we demonstrate how it is possible to automatically learn a KG representing the knowledge contained within the conversational messages shared on social networks such as Facebook by patients with rare diseases, and the impact this can have on creating resources aimed to capture the “voice of patients”.
2020
Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020)
316
324
Giacomo Frisoni, Gianluca Moro, Antonella Carbonaro
File in questo prodotto:
File Dimensione Formato  
101536.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 1.12 MB
Formato Adobe PDF
1.12 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/780119
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 9
social impact