CRIS Current Research Information System

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.

Andrea D’Eusanio, Alessandro Simoni, Stefano Pini, Guido Borghi, Roberto Vezzani, Rita Cucchiara (2020). A Transformer-Based Network for Dynamic Hand Gesture Recognition [10.1109/3DV50981.2020.00072].

A Transformer-Based Network for Dynamic Hand Gesture Recognition

Andrea D’Eusanio;Alessandro Simoni;Stefano Pini;Guido Borghi;Roberto Vezzani;Rita Cucchiara

2020

Abstract

Transformer-based neural networks represent a successful self-attention mechanism that achieves state-of-the-art results in language understanding and sequence modeling. However, their application to visual data and, in particular, to the dynamic hand gesture recognition task has not yet been deeply investigated. In this paper, we propose a transformer-based architecture for the dynamic hand gesture recognition task. We show that the employment of a single active depth sensor, specifically the usage of depth maps and the surface normals estimated from them, achieves state-of-the-art results, overcoming all the methods available in the literature on two automotive datasets, namely NVidia Dynamic Hand Gesture and Briareo. Moreover, we test the method with other data types available with common RGB-D devices, such as infrared and color data. We also assess the performance in terms of inference time and number of parameters, showing that the proposed framework is suitable for an online in-car infotainment system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Titolo del volume
	
				2020 International Conference on 3D Vision (3DV 2020)
			
	Pagina iniziale
	
				623
			
	Pagina finale
	
				632
			
	Codice DOI
	
				https://dx.doi.org/10.1109/3DV50981.2020.00072
			
	Citazione
	
				Andrea D’Eusanio,  Alessandro Simoni,  Stefano Pini,  Guido Borghi,  Roberto Vezzani,  Rita Cucchiara (2020). A Transformer-Based Network for Dynamic Hand Gesture Recognition [10.1109/3DV50981.2020.00072].
			
	Tutti gli autori
	
						Andrea D’Eusanio; Alessandro Simoni; Stefano Pini; Guido Borghi; Roberto Vezzani; Rita Cucchiara
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
3DV_2020.pdf accesso aperto Tipo: Postprint Licenza: Licenza per accesso libero gratuito Dimensione 694.35 kB Formato Adobe PDF Visualizza/Apri	694.35 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/859651

Citazioni

ND

29

21

social impact