CRIS Current Research Information System

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors.

Gambarelli G., Gangemi A. (2022). PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data. BIG DATA AND COGNITIVE COMPUTING, 6(3), 1-19 [10.3390/bdcc6030090].

PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data

Gambarelli G.;Gangemi A.

2022

Abstract

The pervasiveness of dialogue systems and virtual conversation applications raises an important theme: the potential of sharing sensitive information, and the consequent need for protection. To guarantee the subject’s right to privacy, and avoid the leakage of private content, it is important to treat sensitive information. However, any treatment requires firstly to identify sensitive text, and appropriate techniques to do it automatically. The Sensitive Information Detection (SID) task has been explored in the literature in different domains and languages, but there is no common benchmark. Current approaches are mostly based on artificial neural networks (ANN) or transformers based on them. Our research focuses on identifying categories of personal data in informal English sentences, by adopting a new logical-symbolic approach, and eventually hybridising it with ANN models. We present a frame-based knowledge graph built for personal data categories defined in the Data Privacy Vocabulary (DPV). The knowledge graph is designed through the logical composition of already existing frames, and has been evaluated as background knowledge for a SID system against a labeled sensitive information dataset. The accuracy of PRIVAFRAME reached 78%. By comparison, a transformer-based model achieved 12% lower performance on the same dataset. The top-down logical-symbolic frame-based model allows a granular analysis, and does not require a training dataset. These advantages lead us to use it as a layer in a hybrid model, where the logical SID is combined with an ANNs SID tested in a previous study by the authors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Rivista
	
				BIG DATA AND COGNITIVE COMPUTING
			
	Codice DOI
	
				https://dx.doi.org/10.3390/bdcc6030090
			
	Citazione
	
				Gambarelli G.,  Gangemi A. (2022). PRIVAFRAME: A Frame-Based Knowledge Graph for Sensitive Personal Data. BIG DATA AND COGNITIVE COMPUTING, 6(3), 1-19 [10.3390/bdcc6030090].
			
	Tutti gli autori
	
						Gambarelli G.; Gangemi A.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
PRIVAFRAME.pdf accesso aperto Descrizione: Articolo Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 704.52 kB Formato Adobe PDF Visualizza/Apri	704.52 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/899026

Citazioni

ND

6

2

social impact