Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded vision applications typical of edge-nodes of the Internet of Things (IoT). In this paper we demonstrate the performance and energy efficiency of IPA implementing a smart visual trigger application. Experimental results show that the proposed accelerator delivers 507 MOPS and 142 MOPS/mW on the target application, surpassing a low-power processor optimized for DSP applications by 6x in performance and by 10x in energy efficiency. Moreover, it surpasses performance of state of the art CGRAs only capable of implementing data-flow portion of applications by 1.6x, demonstrating the effectiveness of the proposed architecture and computational model.

A 142MOPS/mW integrated programmable array accelerator for smart visual processing

DAS, SATYAJIT
;
Rossi Davide;Benini Luca
2017

Abstract

Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded vision applications typical of edge-nodes of the Internet of Things (IoT). In this paper we demonstrate the performance and energy efficiency of IPA implementing a smart visual trigger application. Experimental results show that the proposed accelerator delivers 507 MOPS and 142 MOPS/mW on the target application, surpassing a low-power processor optimized for DSP applications by 6x in performance and by 10x in energy efficiency. Moreover, it surpasses performance of state of the art CGRAs only capable of implementing data-flow portion of applications by 1.6x, demonstrating the effectiveness of the proposed architecture and computational model.
Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS)
1
4
Das, Satyajit; Rossi, Davide; Martin Kevin, J. M.; Coussy, Philippe; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/653372
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 2
social impact