A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tightly coupled data memory (TCDM) among a significant (up to 16) number of processors is challenging in terms of speed. Sharing L1 cache is even more challenging, since operation is more complex, as it eases programming. The feasibility in terms of performance of shared TCDM was shown in ST Microelectronics platform 2012, but the performance cost of supporting shared L1 cache remains to be proven. In this paper we show that replacing TCDM with a multibanked shared-L1 cache imposes limited speed overhead. Of course, it comes at the cost of area and power. We explore the shared L1 cache architecture in terms of number of processing elements (PEs) and cache banks. Experimental results show that our multi-banked shared-L1 cache can operate with almost the same frequency as that of related TCDM architecture if the cache controller uses a cache line of 4 words. Results also show that, the area overhead with respect to TCDM is less than 18% for a cluster containing 16 Leon3 processors and 32 cache banks. We also show that the overhead on MIPS/Watt and MIPS/mm2 is from 5% to 30% depending on the size of processor in the cluster for a 16x32 configuration (16 cores and 32 cache/memory banks).

M. R. Kakoee, V. Petrovic, L. Benini (2012). A Multi-banked Shared-L1 Cache Architecture for Tightly Coupled Processor Clusters. NEW YORK : IEEE Press [10.1109/ISSoC.2012.6376362].

A Multi-banked Shared-L1 Cache Architecture for Tightly Coupled Processor Clusters

KAKOEE, MOHAMMAD REZA;BENINI, LUCA
2012

Abstract

A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tightly coupled data memory (TCDM) among a significant (up to 16) number of processors is challenging in terms of speed. Sharing L1 cache is even more challenging, since operation is more complex, as it eases programming. The feasibility in terms of performance of shared TCDM was shown in ST Microelectronics platform 2012, but the performance cost of supporting shared L1 cache remains to be proven. In this paper we show that replacing TCDM with a multibanked shared-L1 cache imposes limited speed overhead. Of course, it comes at the cost of area and power. We explore the shared L1 cache architecture in terms of number of processing elements (PEs) and cache banks. Experimental results show that our multi-banked shared-L1 cache can operate with almost the same frequency as that of related TCDM architecture if the cache controller uses a cache line of 4 words. Results also show that, the area overhead with respect to TCDM is less than 18% for a cluster containing 16 Leon3 processors and 32 cache banks. We also show that the overhead on MIPS/Watt and MIPS/mm2 is from 5% to 30% depending on the size of processor in the cluster for a 16x32 configuration (16 cores and 32 cache/memory banks).
2012
System on Chip (SoC), 2012 International Symposium on,
1
5
M. R. Kakoee, V. Petrovic, L. Benini (2012). A Multi-banked Shared-L1 Cache Architecture for Tightly Coupled Processor Clusters. NEW YORK : IEEE Press [10.1109/ISSoC.2012.6376362].
M. R. Kakoee; V. Petrovic; L. Benini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/132961
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? ND
social impact