Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeting rigid power and timing requirements. We introduce the Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC). The accelerator is completely embedded in the form of a Hardware Processing Engine (HWPE) in the PULPissimo platform, a RISCV-based programmable architecture that targets the computational requirements of IoT applications. The SMAC-engine supports configurable precision for both weights (8/6/4 bits) and activations (8/4 bits), with scalable performance. Results in 65 nm technology demonstrate that the serial-MAC approach enables the accelerator to achieve a maximum throughput of 14.28 GMAC/s, consuming 0.58 pJ/MAC@1.0 V when operating at a precision of 4 bits for weights and 8 bits for activations.
A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things / Capra M.; Conti F.; Martina M.. - ELETTRONICO. - 2021-:(2021), pp. 192-197. (Intervento presentato al convegno 2021 IEEE International Midwest Symposium on Circuits and Systems, MWSCAS 2021 tenutosi a usa nel 2021) [10.1109/MWSCAS47672.2021.9531722].
A Multi-Precision Bit-Serial Hardware Accelerator IP for Deep Learning Enabled Internet-of-Things
Conti F.;
2021
Abstract
Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeting rigid power and timing requirements. We introduce the Serial-MAC-engine (SMAC-engine), a fully-digital hardware accelerator for inference of quantized DNNs suitable for integration in a heterogeneous System-on-Chip (SoC). The accelerator is completely embedded in the form of a Hardware Processing Engine (HWPE) in the PULPissimo platform, a RISCV-based programmable architecture that targets the computational requirements of IoT applications. The SMAC-engine supports configurable precision for both weights (8/6/4 bits) and activations (8/4 bits), with scalable performance. Results in 65 nm technology demonstrate that the serial-MAC approach enables the accelerator to achieve a maximum throughput of 14.28 GMAC/s, consuming 0.58 pJ/MAC@1.0 V when operating at a precision of 4 bits for weights and 8 bits for activations.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.