CRIS Current Research Information System

The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. Many recent works focus on reducing network complexity for real-time inference on embedded computing platforms. We adopt an orthogonal viewpoint and propose a novel algorithm exploiting the spatio-temporal sparsity of pixel changes. This optimized inference procedure resulted in an average speed-up of 9.1X over cuDNN on the Tegra X2 platform at a negligible accuracy loss of < 0.1% and no retraining of the network for a semantic segmentation application. Similarly, an average speed-up of 7.0X has been achieved for a pose detection DNN and a reduction of 5X of the number of arithmetic operations to be performed for object detection on static camera video surveillance data. These throughput gains combined with a lower power consumption result in an energy efficiency of 511GOp/s/W compared to 70GOp/s/W for the baseline.

Cavigelli L., Benini L. (2020). CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 30(5), 1451-1465 [10.1109/TCSVT.2019.2903421].

CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

Cavigelli L.;Benini L.

2020

Abstract

The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. Many recent works focus on reducing network complexity for real-time inference on embedded computing platforms. We adopt an orthogonal viewpoint and propose a novel algorithm exploiting the spatio-temporal sparsity of pixel changes. This optimized inference procedure resulted in an average speed-up of 9.1X over cuDNN on the Tegra X2 platform at a negligible accuracy loss of < 0.1% and no retraining of the network for a semantic segmentation application. Similarly, an average speed-up of 7.0X has been achieved for a pose detection DNN and a reduction of 5X of the number of arithmetic operations to be performed for object detection on static camera video surveillance data. These throughput gains combined with a lower power consumption result in an energy efficiency of 511GOp/s/W compared to 70GOp/s/W for the baseline.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista
	
				IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
			
	Codice DOI
	
				https://dx.doi.org/10.1109/TCSVT.2019.2903421
			
	Citazione
	
				Cavigelli L.,  Benini L. (2020). CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 30(5), 1451-1465 [10.1109/TCSVT.2019.2903421].
			
	Tutti gli autori
	
						Cavigelli L.; Benini L.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
CBinfer_aam.pdf Open Access dal 01/12/2020 Descrizione: aam Tipo: Postprint Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC) Dimensione 8.18 MB Formato Adobe PDF Visualizza/Apri	8.18 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/791946

Citazioni

ND

16

16

social impact