We present Mr. Wolf, a Parallel Ultra Low Power (PULP) SoC featuring a hierarchical architecture with a small (12KG) microcontroller class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an 8-cores floating-point capable processing engine available on demand. The proposed SoC, implemented in a 40 nm LP CMOS technology, features a 108 μW fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6Gbit/s in less than 2.5mW. The 8-core compute cluster achieves a peak performance of 850 millions of 32-bit integer multiply and accumulate per second (MMAC/s), 500 millions of 32-bit floating-point multiply and accumulate per second (MFMAC/s)-1 GFLOP/s-with an energy-efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-ON IOT end-nodes improving performance by several orders of magnitude with respect to traditional single core MCUs within a power envelope of 153 mW.

Pullini, A., Rossi, D., Loi, I., Di Mauro, A., Benini, L. (2018). Mr. Wolf: A 1 GFLOP/s Energy-Proportional Parallel Ultra Low Power SoC for IOT Edge Processing. Institute of Electrical and Electronics Engineers Inc. [10.1109/ESSCIRC.2018.8494247].

Mr. Wolf: A 1 GFLOP/s Energy-Proportional Parallel Ultra Low Power SoC for IOT Edge Processing

Pullini, Antonio;Rossi, Davide;Loi, Igor;Benini, Luca
2018

Abstract

We present Mr. Wolf, a Parallel Ultra Low Power (PULP) SoC featuring a hierarchical architecture with a small (12KG) microcontroller class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an 8-cores floating-point capable processing engine available on demand. The proposed SoC, implemented in a 40 nm LP CMOS technology, features a 108 μW fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6Gbit/s in less than 2.5mW. The 8-core compute cluster achieves a peak performance of 850 millions of 32-bit integer multiply and accumulate per second (MMAC/s), 500 millions of 32-bit floating-point multiply and accumulate per second (MFMAC/s)-1 GFLOP/s-with an energy-efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-ON IOT end-nodes improving performance by several orders of magnitude with respect to traditional single core MCUs within a power envelope of 153 mW.
2018
ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference
130
133
Pullini, A., Rossi, D., Loi, I., Di Mauro, A., Benini, L. (2018). Mr. Wolf: A 1 GFLOP/s Energy-Proportional Parallel Ultra Low Power SoC for IOT Edge Processing. Institute of Electrical and Electronics Engineers Inc. [10.1109/ESSCIRC.2018.8494247].
Pullini, Antonio; Rossi, Davide; Loi, Igor; Di Mauro, Alfio; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/653388
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 37
  • ???jsp.display-item.citation.isi??? 29
social impact