Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data

Ciroku, Fiorela; De Giorgis, Stefano; Gangemi, Aldo; Martinez Pandiani, Delfina Sol; Presutti, Valentina

doi:10.1016/j.chb.2023.107997

Frame evocation from visual data is an essential process for multimodal sensemaking, due to the multimodal abstraction provided by frame semantics. However, there is a scarcity of data-driven approaches and tools to automate it. We propose a novel approach for explainable automated multimodal sensemaking by linking linguistic frames to their physical visual occurrences, using ontology-based knowledge engineering techniques. We pair the evocation of linguistic frames from text to visual data as “framal visual manifestations”. We present a deep ontological analysis of the implicit data model of the Visual Genome image dataset, and its formalization in the novel Visual Sense Ontology (VSO). To enhance the multimodal data from this dataset, we introduce a framal knowledge expansion pipeline that extracts and connects linguistic frames – including values and emotions – to images, using multiple linguistic resources for disambiguation. It then introduces the Visual Sense Knowledge Graph (VSKG), a novel resource. VSKG is a queryable knowledge graph that enhances the accessibility and comprehensibility of Visual Genome's multimodal data, based on SPARQL queries. VSKG includes frame visual evocation data, enabling more advanced forms of explicit reasoning, analysis and sensemaking. Our work represents a significant advancement in the automation of frame evocation and multimodal sense-making, performed in a fully interpretable and transparent way, with potential applications in various fields, including the fields of knowledge representation, computer vision, and natural language processing.

Ciroku, F., De Giorgis, S., Gangemi, A., Martinez Pandiani, D.S., Presutti, V. (2024). Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data. COMPUTERS IN HUMAN BEHAVIOR, 150, 79-97 [10.1016/j.chb.2023.107997].

Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data

Ciroku Fiorela^Software;De Giorgis Stefano^{Formal Analysis};Gangemi Aldo^Supervision;Martinez-Pandiani Delfina Sol^{Investigation};Presutti Valentina^Supervision

2024

Abstract

Frame evocation from visual data is an essential process for multimodal sensemaking, due to the multimodal abstraction provided by frame semantics. However, there is a scarcity of data-driven approaches and tools to automate it. We propose a novel approach for explainable automated multimodal sensemaking by linking linguistic frames to their physical visual occurrences, using ontology-based knowledge engineering techniques. We pair the evocation of linguistic frames from text to visual data as “framal visual manifestations”. We present a deep ontological analysis of the implicit data model of the Visual Genome image dataset, and its formalization in the novel Visual Sense Ontology (VSO). To enhance the multimodal data from this dataset, we introduce a framal knowledge expansion pipeline that extracts and connects linguistic frames – including values and emotions – to images, using multiple linguistic resources for disambiguation. It then introduces the Visual Sense Knowledge Graph (VSKG), a novel resource. VSKG is a queryable knowledge graph that enhances the accessibility and comprehensibility of Visual Genome's multimodal data, based on SPARQL queries. VSKG includes frame visual evocation data, enabling more advanced forms of explicit reasoning, analysis and sensemaking. Our work represents a significant advancement in the automation of frame evocation and multimodal sense-making, performed in a fully interpretable and transparent way, with potential applications in various fields, including the fields of knowledge representation, computer vision, and natural language processing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				COMPUTERS IN HUMAN BEHAVIOR
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.chb.2023.107997
			
	Citazione
	
				Ciroku, F., De Giorgis, S., Gangemi, A., Martinez Pandiani, D.S., Presutti, V. (2024). Automated multimodal sensemaking: Ontology-based integration of linguistic frames and visual data. COMPUTERS IN HUMAN BEHAVIOR, 150, 79-97 [10.1016/j.chb.2023.107997].
			
	Tutti gli autori
	
						Ciroku, Fiorela; De Giorgis, Stefano; Gangemi, Aldo; Martinez Pandiani, Delfina Sol; Presutti, Valentina

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/950958

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

12

7

ND

CRIS Current Research Information System