The instruction memory hierarchy plays a critical role in performance and energy efficiency of ultralow-power (ULP) processors for the Internet-of-Things (IoT) end-nodes. This is mainly due to the extremely tight power envelope and area budgets, which imply small instruction-caches (I-Cache) operating at very low supply voltages (near-threshold). The challenge is aggravated by the fact that multiple processors, fetching in parallel, require plenty of bandwidth from the I-Caches. In this letter, we propose a low-cost and energy efficient hybrid instruction-prefetching mechanism to be integrated with a ULP multicore cluster. We study its performance for a wide range of IoT applications, from cryptography to computer vision, and show that it can effectively improve the hit-rate of almost all of them to above 95% (average performance improvement of over 2 \times ). In addition, we designed our prefetcher and integrated it in a 4-cores cluster in 28 nm fully-depleted silicon-on-insulator (FDSOI) technology. We show that system's power consumption increases only by about 11% and silicon area by less than 1%. Altogether, a total energy reduction of 1.9x is achieved, thanks to more than 2x performance improvement, enabling a significantly longer battery life.

A Hybrid Instruction Prefetching Mechanism for Ultra Low-Power Multicore Clusters / Payami, Maryam*; Azarkhish, Erfan; Loi, Igor; Benini, Luca. - In: IEEE EMBEDDED SYSTEMS LETTERS. - ISSN 1943-0663. - STAMPA. - 9:4(2017), pp. 7933223.125-7933223.128. [10.1109/LES.2017.2707978]

A Hybrid Instruction Prefetching Mechanism for Ultra Low-Power Multicore Clusters

Azarkhish, Erfan;Loi, Igor;Benini, Luca
2017

Abstract

The instruction memory hierarchy plays a critical role in performance and energy efficiency of ultralow-power (ULP) processors for the Internet-of-Things (IoT) end-nodes. This is mainly due to the extremely tight power envelope and area budgets, which imply small instruction-caches (I-Cache) operating at very low supply voltages (near-threshold). The challenge is aggravated by the fact that multiple processors, fetching in parallel, require plenty of bandwidth from the I-Caches. In this letter, we propose a low-cost and energy efficient hybrid instruction-prefetching mechanism to be integrated with a ULP multicore cluster. We study its performance for a wide range of IoT applications, from cryptography to computer vision, and show that it can effectively improve the hit-rate of almost all of them to above 95% (average performance improvement of over 2 \times ). In addition, we designed our prefetcher and integrated it in a 4-cores cluster in 28 nm fully-depleted silicon-on-insulator (FDSOI) technology. We show that system's power consumption increases only by about 11% and silicon area by less than 1%. Altogether, a total energy reduction of 1.9x is achieved, thanks to more than 2x performance improvement, enabling a significantly longer battery life.
2017
A Hybrid Instruction Prefetching Mechanism for Ultra Low-Power Multicore Clusters / Payami, Maryam*; Azarkhish, Erfan; Loi, Igor; Benini, Luca. - In: IEEE EMBEDDED SYSTEMS LETTERS. - ISSN 1943-0663. - STAMPA. - 9:4(2017), pp. 7933223.125-7933223.128. [10.1109/LES.2017.2707978]
Payami, Maryam*; Azarkhish, Erfan; Loi, Igor; Benini, Luca
File in questo prodotto:
File Dimensione Formato  
A Hybrid Instruction Prefetching Mechanism_fulltext.pdf

accesso aperto

Descrizione: Articolo versione postprint
Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 806.13 kB
Formato Adobe PDF
806.13 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/624042
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact