CRIS Current Research Information System

Dynamic vision sensor (DVS) cameras enable energy-activity proportional visual sensing by only propagating events produced by changes in the observed scene. Furthermore, by generating these events asynchronously, they offer s-scale latency while eliminating the redundant data transmission inherent to classical, frame-based cameras. However, the potential of DVS to improve the energy efficiency of IoT sensor nodes can only be fully realized with efficient and flexible systems that tightly integrate sensing, processing, and actuation capabilities. In this paper, we propose a complete end-to-end pipeline for DVS event data classification implemented on the Kraken parallel ultra-low power (PULP) system-on-chip and apply it to gesture recognition. A dedicated on-chip peripheral interface for DVS cameras aggregates the received events into ternary event frames. We process these video frames with a fully ternarized two-stage temporal convolutional network (TCN). The neural network can be executed either on Kraken’s PULP cluster of general-purpose RISC-V cores or on CUTIE, the on-chip ternary neural network accelerator. We perform extensive ablations on network structure, training, and data generation parameters. We achieve a validation accuracy of 97.7 % on the DVS128 11-class gesture dataset, a new record for embedded implementations. With in-silicon power and energy measurements, we demonstrate a classification energy of 7 J at a latency of 0.9 ms when running the TCN on CUTIE, a reduction of inference energy by when compared to the state of the art in embedded gesture recognition. The processing system consumes as little as 4.7 mW in continuous inference, enabling always-on gesture recognition and closing the gap between the efficiency potential of DVS cameras and application scenarios.

Rutishauser, G., Scherer, M., Fischer, T., Benini, L. (2023). 7 μJ/inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks. FUTURE GENERATION COMPUTER SYSTEMS, 149, 717-731 [10.1016/j.future.2023.07.017].

7 μJ/inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks

Rutishauser, Georg;Scherer, Moritz;Fischer, Tim;Benini, Luca

2023

Abstract

Dynamic vision sensor (DVS) cameras enable energy-activity proportional visual sensing by only propagating events produced by changes in the observed scene. Furthermore, by generating these events asynchronously, they offer s-scale latency while eliminating the redundant data transmission inherent to classical, frame-based cameras. However, the potential of DVS to improve the energy efficiency of IoT sensor nodes can only be fully realized with efficient and flexible systems that tightly integrate sensing, processing, and actuation capabilities. In this paper, we propose a complete end-to-end pipeline for DVS event data classification implemented on the Kraken parallel ultra-low power (PULP) system-on-chip and apply it to gesture recognition. A dedicated on-chip peripheral interface for DVS cameras aggregates the received events into ternary event frames. We process these video frames with a fully ternarized two-stage temporal convolutional network (TCN). The neural network can be executed either on Kraken’s PULP cluster of general-purpose RISC-V cores or on CUTIE, the on-chip ternary neural network accelerator. We perform extensive ablations on network structure, training, and data generation parameters. We achieve a validation accuracy of 97.7 % on the DVS128 11-class gesture dataset, a new record for embedded implementations. With in-silicon power and energy measurements, we demonstrate a classification energy of 7 J at a latency of 0.9 ms when running the TCN on CUTIE, a reduction of inference energy by when compared to the state of the art in embedded gesture recognition. The processing system consumes as little as 4.7 mW in continuous inference, enabling always-on gesture recognition and closing the gap between the efficiency potential of DVS cameras and application scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista
	
				FUTURE GENERATION COMPUTER SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.future.2023.07.017
			
	Citazione
	
				Rutishauser, G., Scherer, M., Fischer, T., Benini, L. (2023). 7 μJ/inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks. FUTURE GENERATION COMPUTER SYSTEMS, 149, 717-731 [10.1016/j.future.2023.07.017].
			
	Tutti gli autori
	
						Rutishauser, Georg; Scherer, Moritz; Fischer, Tim; Benini, Luca
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
7 inference end-to-end gesture recognition from dynamic vision sensor data using ternarized hybrid convolutional neural networks.pdf accesso aperto Descrizione: versione editoriale Tipo: Versione (PDF) editoriale Licenza: Creative commons Dimensione 1.16 MB Formato Adobe PDF Visualizza/Apri	1.16 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/956437

Citazioni

ND

4

2

social impact