We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× than STM32H7.
PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V based parallel ultra low power clusters / Garofalo A.; Rusci M.; Conti F.; Rossi D.; Benini L.. - ELETTRONICO. - (2019), pp. 8965067.33-8965067.36. (Intervento presentato al convegno 26th IEEE International Conference on Electronics, Circuits and Systems, ICECS 2019 tenutosi a ita nel 2019) [10.1109/ICECS46596.2019.8965067].
PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V based parallel ultra low power clusters
Garofalo A.;Rusci M.;Conti F.
;Rossi D.;Benini L.
2019
Abstract
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× than STM32H7.File | Dimensione | Formato | |
---|---|---|---|
Postprint_PULP-NN_A Computing Library.pdf
accesso aperto
Descrizione: Articolo postprint
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
1.44 MB
Formato
Adobe PDF
|
1.44 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.