In this paper we give a fresh look to Coarse Grained Reconfigurable Arrays (CGRAs) as ultralow power accelerators for near-sensor processing. We present a general-purpose Integrated Programmable-Array accelerator (IPA) exploiting a novel architecture, execution model, and compilation flow for application mapping that can handle kernels containing complex control flow, without the significant energy overhead incurred by state of the art predication approaches. To optimize the performance and energy efficiency, we explore the IPA architecture with special focus on shared memory access, with the help of the flexible compilation flow presented in this paper. We achieve a maximum energy gain of 2×, and performance gain of 1.33× and 1.8× compared with state of the art partial and full predication techniques, respectively. The proposed accelerator achieves an average energy efficiency of 1617 MOPS/mW operating at 100MHz, 0.6V in 28nm UTBB FD-SOI technology, over a wide range of near-sensor processing kernels, leading to an improvement up to 18×, with an average of 9.23× (as well as a speed-up up to 20.3×, with an average of 9.7×) compared to a core specialized for ultralow power near-sensor processing.

Das, S., Martin, K.J.M., Rossi, D., Coussy, P., Benini, L. (2019). An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultralow Power Processing. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 38(6), 1095-1108 [10.1109/TCAD.2018.2834397].

An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultralow Power Processing

Das, Satyajit;Rossi, Davide;Benini, Luca
2019

Abstract

In this paper we give a fresh look to Coarse Grained Reconfigurable Arrays (CGRAs) as ultralow power accelerators for near-sensor processing. We present a general-purpose Integrated Programmable-Array accelerator (IPA) exploiting a novel architecture, execution model, and compilation flow for application mapping that can handle kernels containing complex control flow, without the significant energy overhead incurred by state of the art predication approaches. To optimize the performance and energy efficiency, we explore the IPA architecture with special focus on shared memory access, with the help of the flexible compilation flow presented in this paper. We achieve a maximum energy gain of 2×, and performance gain of 1.33× and 1.8× compared with state of the art partial and full predication techniques, respectively. The proposed accelerator achieves an average energy efficiency of 1617 MOPS/mW operating at 100MHz, 0.6V in 28nm UTBB FD-SOI technology, over a wide range of near-sensor processing kernels, leading to an improvement up to 18×, with an average of 9.23× (as well as a speed-up up to 20.3×, with an average of 9.7×) compared to a core specialized for ultralow power near-sensor processing.
2019
Das, S., Martin, K.J.M., Rossi, D., Coussy, P., Benini, L. (2019). An Energy-Efficient Integrated Programmable Array Accelerator and Compilation flow for Near-Sensor Ultralow Power Processing. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 38(6), 1095-1108 [10.1109/TCAD.2018.2834397].
Das, Satyajit; Martin, Kevin J. M.; Rossi, Davide; Coussy, Philippe; Benini, Luca
File in questo prodotto:
File Dimensione Formato  
TCAD17_postprint.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.48 MB
Formato Adobe PDF
1.48 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/653384
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 11
social impact