The recent spread of low-cost and high-quality RGB-D and infrared sensors has supported the development of Natural User Interfaces (NUIs) in which the interaction is carried without the use of physical devices such as keyboards and mouse. In this paper, we propose a NUI based on dynamic hand gestures, acquired with RGB, depth and infrared sensors. The system is developed for the challenging automotive context, aiming at reducing the driver’s distraction during the driving activity. Specifically, the proposed framework is based on a multimodal combination of Convolutional Neural Networks whose input is represented by depth and infrared images, achieving a good level of light invariance, a key element in vision-based in-car systems. We test our system on a recent multimodal dataset collected in a realistic automotive setting, placing the sensors in an innovative point of view, i.e., in the tunnel console looking upwards. The dataset consists of a great amount of labelled frames containing 12 dynamic gestures performed by multiple subjects, making it suitable for deep learning-based approaches. In addition, we test the system on a different well-known public dataset, created for the interaction between the driver and the car. Experimental results on both datasets reveal the efficacy and the real-time performance of the proposed method.

Multimodal Hand Gesture Classification for the Human-Car Interaction

Guido Borghi;Rita Cucchiara
2020

Abstract

The recent spread of low-cost and high-quality RGB-D and infrared sensors has supported the development of Natural User Interfaces (NUIs) in which the interaction is carried without the use of physical devices such as keyboards and mouse. In this paper, we propose a NUI based on dynamic hand gestures, acquired with RGB, depth and infrared sensors. The system is developed for the challenging automotive context, aiming at reducing the driver’s distraction during the driving activity. Specifically, the proposed framework is based on a multimodal combination of Convolutional Neural Networks whose input is represented by depth and infrared images, achieving a good level of light invariance, a key element in vision-based in-car systems. We test our system on a recent multimodal dataset collected in a realistic automotive setting, placing the sensors in an innovative point of view, i.e., in the tunnel console looking upwards. The dataset consists of a great amount of labelled frames containing 12 dynamic gestures performed by multiple subjects, making it suitable for deep learning-based approaches. In addition, we test the system on a different well-known public dataset, created for the interaction between the driver and the car. Experimental results on both datasets reveal the efficacy and the real-time performance of the proposed method.
Andrea D’Eusanio; Alessandro Simoni; Stefano Pini; Guido Borghi; Roberto Vezzani; Rita Cucchiara
File in questo prodotto:
File Dimensione Formato  
informatics-07-00031-v2 (2).pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 2 MB
Formato Adobe PDF
2 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/858384
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 9
social impact