Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers.
D. Bortolotti, F. Paterna, C. Pinto, A. Marongiu, M. Ruggiero, L. Benini (2011). Exploring Instruction caching strategies for tightly-coupled shared-memory clusters. NEW YORK : IEEE Press [10.1109/ISSOC.2011.6089691].
Exploring Instruction caching strategies for tightly-coupled shared-memory clusters
BORTOLOTTI, DANIELE;PATERNA, FRANCESCO;PINTO, CHRISTIAN;MARONGIU, ANDREA;RUGGIERO, MARTINO;BENINI, LUCA
2011
Abstract
Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.