Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Risso, Matteo; Burrello, Alessio; Benini, Luca; Macii, Enrico; Poncino, Massimo; Pagliari, Daniele Jahier

doi:10.1109/IGSC55832.2022.9969373

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.

Risso, M., Burrello, A., Benini, L., Macii, E., Poncino, M., Pagliari, D.J. (2022). Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/IGSC55832.2022.9969373].

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Risso, Matteo;Burrello, Alessio;Benini, Luca;Macii, Enrico;Poncino, Massimo;Pagliari, Daniele Jahier

2022

Abstract

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-the-art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo del volume
	
				2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC)
			
	Pagina iniziale
	
				.
			
	Pagina finale
	
				.
			
	Codice DOI
	
				https://dx.doi.org/10.1109/IGSC55832.2022.9969373
			
	Citazione
	
				Risso, M., Burrello, A., Benini, L., Macii, E., Poncino, M., Pagliari, D.J. (2022). Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/IGSC55832.2022.9969373].
			
	Tutti gli autori
	
						Risso, Matteo; Burrello, Alessio; Benini, Luca; Macii, Enrico; Poncino, Massimo; Pagliari, Daniele Jahier

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/956644

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

CRIS Current Research Information System