CRIS Current Research Information System

The emerging trend of deploying complex algorithms, such as Deep Neural networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN's memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac&Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to 8.5x speed-up, and an area overhead of only 5.6% with respect to the baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by 2x - 2.5x with respect to existing solutions using fully flexible programmable processors.

Nadalini, A., Rutishauser, G., Burrello, A., Bruschi, N., Garofalo, A., Benini, L., et al. (2023). A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/ISVLSI59464.2023.10238679].

A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks

Nadalini, A;Rutishauser, G;Burrello, A;Bruschi, N;Garofalo, A;Benini, L;Conti, F;Rossi, D

2023

Abstract

The emerging trend of deploying complex algorithms, such as Deep Neural networks (DNNs), increasingly poses strict memory and energy efficiency requirements on Internet-of-Things (IoT) end-nodes. Mixed-precision quantization has been proposed as a technique to minimize a DNN's memory footprint and maximize its execution efficiency, with negligible end-to-end precision degradation. In this work, we present a novel hardware and software stack for energy-efficient inference of mixed-precision Quantized Neural Networks (QNNs). We introduce Flex-V, a processor based on the RISC-V Instruction Set Architecture (ISA) that features fused Mac&Load mixed-precision dot product instructions; to avoid the exponential growth of the encoding space due to mixed-precision variants, we encode formats into the Control-Status Registers (CSRs). Flex-V core is integrated into a tightly-coupled cluster of eight processors; in addition, we provide a full framework for the end-to-end deployment of DNNs including a compiler, optimized libraries, and a memory-aware deployment flow. Our results show up to 91.5 MAC/cycle and 3.26 TOPS/W on the cluster, implemented in a commercial 22nm FDX technology, with up to 8.5x speed-up, and an area overhead of only 5.6% with respect to the baseline. To demonstrate the capabilities of the architecture, we benchmark it with end-to-end real-life QNNs, improving performance by 2x - 2.5x with respect to existing solutions using fully flexible programmable processors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
			
	Pagina iniziale
	
				145
			
	Pagina finale
	
				150
			
	Collana/Serie
	
				PROCEEDINGS IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ISVLSI59464.2023.10238679
			
	Citazione
	
				Nadalini, A., Rutishauser, G., Burrello, A., Bruschi, N., Garofalo, A., Benini, L., et al. (2023). A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/ISVLSI59464.2023.10238679].
			
	Tutti gli autori
	
						Nadalini, A; Rutishauser, G; Burrello, A; Bruschi, N; Garofalo, A; Benini, L; Conti, F; Rossi, D

File in questo prodotto:

File	Dimensione	Formato
a 3 tops w risc post print .pdf accesso aperto Tipo: Postprint Licenza: Licenza per accesso libero gratuito Dimensione 644.97 kB Formato Adobe PDF Visualizza/Apri	644.97 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/953207

Citazioni

ND

3

1

social impact