We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× than STM32H7.
Garofalo A., Rusci M., Conti F., Rossi D., Benini L. (2019). PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V based parallel ultra low power clusters. 345 E 47TH ST, NEW YORK, NY 10017 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/ICECS46596.2019.8965067].
PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V based parallel ultra low power clusters
Garofalo A.;Rusci M.;Conti F.
;Rossi D.;Benini L.
2019
Abstract
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× than STM32H7.File | Dimensione | Formato | |
---|---|---|---|
Postprint_PULP-NN_A Computing Library.pdf
accesso aperto
Descrizione: Articolo postprint
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
1.44 MB
Formato
Adobe PDF
|
1.44 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.