QUENN: Quantization engine for low-power neural networks

De Prado, Miguel; Denna, Maurizio; Benini, Luca; Pazos, Nuria

doi:10.1145/3203217.3203282

Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this context, Bonseyes comes in as an initiative to enable stakeholders to bring AI to low-power and autonomous environments such as: Automotive, Medical Healthcare and Consumer Electronics. To achieve this, we introduce LPDNN, a framework for optimized deployment of Deep Neural Networks on heterogeneous embedded devices. In this work, we detail the quantization engine that is integrated in LPDNN. The engine depends on a fine-grained workflow which enables a Neural Network Design Exploration and a sensitivity analysis of each layer for quantization. We demonstrate the engine with a case study on Alexnet and VGG16 for three different techniques for direct quantization: standard fixed-point, dynamic fixed-point and k-means clustering, and demonstrate the potential of the latter. We argue that using a Gaussian quantizer with k-means clustering can achieve better performance than linear quantizers. Without retraining, we achieve over 55.64% saving for weights' storage and 69.17% for run-time memory accesses with less than 1% drop in top5 accuracy in Imagenet.

De Prado, M., Denna, M., Benini, L., Pazos, N. (2018). QUENN: Quantization engine for low-power neural networks. Association for Computing Machinery, Inc [10.1145/3203217.3203282].

QUENN: Quantization engine for low-power neural networks

De Prado, Miguel;Denna, Maurizio;Benini, Luca;Pazos, Nuria

2018

Abstract

Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this context, Bonseyes comes in as an initiative to enable stakeholders to bring AI to low-power and autonomous environments such as: Automotive, Medical Healthcare and Consumer Electronics. To achieve this, we introduce LPDNN, a framework for optimized deployment of Deep Neural Networks on heterogeneous embedded devices. In this work, we detail the quantization engine that is integrated in LPDNN. The engine depends on a fine-grained workflow which enables a Neural Network Design Exploration and a sensitivity analysis of each layer for quantization. We demonstrate the engine with a case study on Alexnet and VGG16 for three different techniques for direct quantization: standard fixed-point, dynamic fixed-point and k-means clustering, and demonstrate the potential of the latter. We argue that using a Gaussian quantizer with k-means clustering can achieve better performance than linear quantizers. Without retraining, we achieve over 55.64% saving for weights' storage and 69.17% for run-time memory accesses with less than 1% drop in top5 accuracy in Imagenet.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Titolo del volume
	
				2018 ACM International Conference on Computing Frontiers, CF 2018 - Proceedings
			
	Pagina iniziale
	
				36
			
	Pagina finale
	
				44
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3203217.3203282
			
	Citazione
	
				De Prado, M., Denna, M., Benini, L., Pazos, N. (2018). QUENN: Quantization engine for low-power neural networks. Association for Computing Machinery, Inc [10.1145/3203217.3203282].
			
	Tutti gli autori
	
						De Prado, Miguel; Denna, Maurizio; Benini, Luca; Pazos, Nuria
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/677188

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

9

6

CRIS Current Research Information System