Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers.
Titolo: | Exploring Instruction caching strategies for tightly-coupled shared-memory clusters |
Autore/i: | BORTOLOTTI, DANIELE; PATERNA, FRANCESCO; PINTO, CHRISTIAN; MARONGIU, ANDREA; RUGGIERO, MARTINO; BENINI, LUCA |
Autore/i Unibo: | |
Anno: | 2011 |
Titolo del libro: | System on Chip (SoC), 2011 International Symposium on |
Pagina iniziale: | 34 |
Pagina finale: | 41 |
Digital Object Identifier (DOI): | http://dx.doi.org/10.1109/ISSOC.2011.6089691 |
Abstract: | Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core. Thus, aggregated I-fetch bandwidth approaches ƒ * N, where ƒ is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i) private instruction caches per core and (ii) shared instruction cache per cluster. We developed a cycle-accurate model of the tightly coupled cluster with several configurable architectural parameters for exploration, plus a programming environment targeted at efficient data-parallel computing. We conduct an in-depth study of the two architectural templates based on the use of both synthetic microbenchmarks and real program workloads. Our results provide useful insights and guidelines for designers. |
Data prodotto definitivo in UGOV: | 26-giu-2013 |
Appare nelle tipologie: | 4.01 Contributo in Atti di convegno |