Endpoint devices for Internet-of-Things not only need to work under extremely tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from a few kOPS to GOPS. Near-threshold (NT) operation can achieve higher energy efficiency, and the performance scalability can be gained through parallelism. In this paper, we describe the design of an open-source RISC-V processor core specifically designed for NT operation in tightly coupled multicore clusters. We introduce instruction extensions and microarchitectural optimizations to increase the computational density and to minimize the pressure toward the shared-memory hierarchy. For typical data-intensive sensor processing workloads, the proposed core is, on average, 3.5$x faster and 3.2x more energy efficient, thanks to a smart L0 buffer to reduce cache access contentions and support for compressed instructions. Single Instruction Multiple Data extensions, such as dot products, and a built-in L0 storage further reduce the shared-memory accesses by 8x reducing contentions by 3.2x. With four NT-optimized cores, the cluster is operational from 0.6 to 1.2 V, achieving a peak efficiency of 67 MOPS/mW in a low-cost 65-nm bulk CMOS technology. In a low-power 28-nm FD-SOI process, a peak efficiency of 193 MOPS/mW (40 MHz and 1 mW) can be achieved.
Gautschi, M., Schiavone, P.D., Traber, A., Loi, I., Pullini, A., Rossi, D., et al. (2017). Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 25(10), 2700-2713 [10.1109/TVLSI.2017.2654506].
Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices
LOI, IGOR;PULLINI, ANTONIO;ROSSI, DAVIDE;FLAMAND, ERIC;BENINI, LUCA
2017
Abstract
Endpoint devices for Internet-of-Things not only need to work under extremely tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from a few kOPS to GOPS. Near-threshold (NT) operation can achieve higher energy efficiency, and the performance scalability can be gained through parallelism. In this paper, we describe the design of an open-source RISC-V processor core specifically designed for NT operation in tightly coupled multicore clusters. We introduce instruction extensions and microarchitectural optimizations to increase the computational density and to minimize the pressure toward the shared-memory hierarchy. For typical data-intensive sensor processing workloads, the proposed core is, on average, 3.5$x faster and 3.2x more energy efficient, thanks to a smart L0 buffer to reduce cache access contentions and support for compressed instructions. Single Instruction Multiple Data extensions, such as dot products, and a built-in L0 storage further reduce the shared-memory accesses by 8x reducing contentions by 3.2x. With four NT-optimized cores, the cluster is operational from 0.6 to 1.2 V, achieving a peak efficiency of 67 MOPS/mW in a low-cost 65-nm bulk CMOS technology. In a low-power 28-nm FD-SOI process, a peak efficiency of 193 MOPS/mW (40 MHz and 1 mW) can be achieved.File | Dimensione | Formato | |
---|---|---|---|
GAUTSCHI_TVLSI_2017_disclaimer.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
6.02 MB
Formato
Adobe PDF
|
6.02 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.