We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× than STM32H7.

PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V based parallel ultra low power clusters

Garofalo A.;Rusci M.;Conti F.
;
Rossi D.;Benini L.
2019

Abstract

We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM32H7 MCUs, respectively. By running the library kernels on the GAP-8 processor at the maximum efficiency operating point, the energy efficiency on GAP-8 is 14.1× higher than STM32L4 and 39.5× than STM32H7.
2019 26th IEEE International Conference on Electronics, Circuits and Systems, ICECS 2019
33
36
Garofalo A.; Rusci M.; Conti F.; Rossi D.; Benini L.
File in questo prodotto:
File Dimensione Formato  
Postprint_PULP-NN_A Computing Library.pdf

accesso aperto

Descrizione: Articolo postprint
Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.44 MB
Formato Adobe PDF
1.44 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/767263
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 11
social impact