CRIS Current Research Information System

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.

Nazareno Bruschi, A.G. (2020). Enabling mixed-precision quantized neural networks in extreme-edge devices. New York : Association for Computing Machinery, Inc [10.1145/3387902.3394038].

Enabling mixed-precision quantized neural networks in extreme-edge devices

Nazareno Bruschi;Angelo Garofalo;Francesco Conti;Giuseppe Tagliavini;Davide Rossi

2020

Abstract

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA). As such, recent research proposed optimized libraries for QNNs (from 8-bit to 2-bit) such as CMSIS-NN and PULP-NN. This work presents an extension to the PULP-NN library targeting the acceleration of mixed-precision Deep Neural Networks, an emerging paradigm able to significantly shrink the memory footprint of deep neural networks with negligible accuracy loss. The library, composed of 27 kernels, one for each permutation of input feature maps, weights, and output feature maps precision (considering 8-bit, 4-bit and 2-bit), enables efficient inference of QNN on parallel ultra-low-power (PULP) clusters of RISC-V based processors, featuring the RV32IMCXpulpV2 ISA. The proposed solution, benchmarked on an 8-cores GAP-8 PULP cluster, reaches peak performance of 16 MACs/cycle on 8 cores, performing 21× to 25× faster than an STM32H7 (powered by an ARM Cortex M7 processor) with 15× to 21× better energy efficiency.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Titolo del volume
	
				17th ACM International Conference on Computing Frontiers 2020, CF 2020 - Proceedings
			
	Pagina iniziale
	
				217
			
	Pagina finale
	
				220
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3387902.3394038
			
	Citazione
	
				Nazareno Bruschi, A.G. (2020). Enabling mixed-precision quantized neural networks in extreme-edge devices. New York : Association for Computing Machinery, Inc [10.1145/3387902.3394038].
			
	Tutti gli autori
	
						Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, Davide Rossi
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Enabling_Mixed_Precision_Quantized_Neural_Networks_in_Extreme_Edge_Devices.pdf accesso aperto Tipo: Postprint Licenza: Licenza per accesso libero gratuito Dimensione 1.7 MB Formato Adobe PDF Visualizza/Apri	1.7 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/761813

Citazioni

ND

19

17

social impact