L1 instruction caches in many-core systems represent a siz-able fraction of the total power consumption. Although large instruction caches can significantly improve performance, they have the potential to increase power consumption. Pri-vate caches are usually able to achieve higher speed, due to their simpler design, but the smaller L1 memory space seen by each core induces a high miss ratio. Shared instruction cache can be seen as an attractive solution to improve per-formance and energy efficiency while reducing area. In this paper we propose a multi-banked, shared instruction cache architecture suitable for ultra-low power multicore systems, where parallelism and near threshold operation is used to achieve minimum energy. We implemented the cluster ar-chitecture with different configurations of cache sharing, uti-lizing the 28nm UTBB FD-SOI from STMicroelectronics as reference technology. Experimental results, based on several real-life applications, demonstrate that sharing mechanisms have no impact on the system operating frequency, and al-low to reduce the energy consumption of the cache subsys-tem by up to 10%, while keeping the same area footprint, or reducing by 2x the overall shared cache area, while keeping the same performance and energy efficiency with respect to a cluster of processing elements with private program caches.
Loi, I., Rossi, D., Haugou, G., Gautschi, M., Benini, L. (2015). Exploring multi-banked shared-l1 program cache on ultra-low power, tightly coupled processor clusters. Association for Computing Machinery, Inc [10.1145/2742854.2747288].
Exploring multi-banked shared-l1 program cache on ultra-low power, tightly coupled processor clusters
LOI, IGOR;ROSSI, DAVIDE;BENINI, LUCA
2015
Abstract
L1 instruction caches in many-core systems represent a siz-able fraction of the total power consumption. Although large instruction caches can significantly improve performance, they have the potential to increase power consumption. Pri-vate caches are usually able to achieve higher speed, due to their simpler design, but the smaller L1 memory space seen by each core induces a high miss ratio. Shared instruction cache can be seen as an attractive solution to improve per-formance and energy efficiency while reducing area. In this paper we propose a multi-banked, shared instruction cache architecture suitable for ultra-low power multicore systems, where parallelism and near threshold operation is used to achieve minimum energy. We implemented the cluster ar-chitecture with different configurations of cache sharing, uti-lizing the 28nm UTBB FD-SOI from STMicroelectronics as reference technology. Experimental results, based on several real-life applications, demonstrate that sharing mechanisms have no impact on the system operating frequency, and al-low to reduce the energy consumption of the cache subsys-tem by up to 10%, while keeping the same area footprint, or reducing by 2x the overall shared cache area, while keeping the same performance and energy efficiency with respect to a cluster of processing elements with private program caches.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


