Emerging artificial intelligence-enabled Internet-of-Things (AI-IoT) system-on-chip (SoC) for augmented reality, personalized healthcare, and nanorobotics need to run many diverse tasks within a power envelope of a few tens of mW over a wide range of operating conditions: compute-intensive but strongly quantized deep neural network (DNN) inference, as well as signal processing and control requiring high-precision floating point. We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22 nm FDX that combines: 1) a general-purpose cluster of 16 RISC-V digital signal processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-and 2-bit arithmetic extensions (XpulpNN), combined with fused multiply accumulate (MAC) and LOAD operations and floating-point support; 2) a 2-8 bit reconfigurable binary engine (RBE) to accelerate 3 x 3 and 1 x 1 (pointwise) convolutions in DNNs; 3) a set of on-chip monitoring (OCM) blocks connected to an adaptive body biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages. achieves up to 180 Gop/s or 3.32 Top/s/W on 2-bit precision arithmetic in software, and up to 637 Gop/s or 12.4 Top/s/W on hardware-accelerated DNN layers.
Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, et al. (2023). Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC With 2???8 b DNN Acceleration and 30%-Boost Adaptive Body Biasing. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 59(1), 128-142 [10.1109/jssc.2023.3318301].
Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC With 2???8 b DNN Acceleration and 30%-Boost Adaptive Body Biasing
Francesco Conti
;Angelo Garofalo;Davide Rossi;Gianmarco Ottavi;Luca Benini
2023
Abstract
Emerging artificial intelligence-enabled Internet-of-Things (AI-IoT) system-on-chip (SoC) for augmented reality, personalized healthcare, and nanorobotics need to run many diverse tasks within a power envelope of a few tens of mW over a wide range of operating conditions: compute-intensive but strongly quantized deep neural network (DNN) inference, as well as signal processing and control requiring high-precision floating point. We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22 nm FDX that combines: 1) a general-purpose cluster of 16 RISC-V digital signal processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-and 2-bit arithmetic extensions (XpulpNN), combined with fused multiply accumulate (MAC) and LOAD operations and floating-point support; 2) a 2-8 bit reconfigurable binary engine (RBE) to accelerate 3 x 3 and 1 x 1 (pointwise) convolutions in DNNs; 3) a set of on-chip monitoring (OCM) blocks connected to an adaptive body biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages. achieves up to 180 Gop/s or 3.32 Top/s/W on 2-bit precision arithmetic in software, and up to 637 Gop/s or 12.4 Top/s/W on hardware-accelerated DNN layers.File | Dimensione | Formato | |
---|---|---|---|
Binder3.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
16.92 MB
Formato
Adobe PDF
|
16.92 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.