A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tightly coupled data memory (TCDM) among a significant (up to 16) number of processors is challenging in terms of speed. Sharing L1 cache is even more challenging, since operation is more complex, as it eases programming. The feasibility in terms of performance of shared TCDM was shown in ST Microelectronics platform 2012, but the performance cost of supporting shared L1 cache remains to be proven. In this paper we show that replacing TCDM with a multibanked shared-L1 cache imposes limited speed overhead. Of course, it comes at the cost of area and power. We explore the shared L1 cache architecture in terms of number of processing elements (PEs) and cache banks. Experimental results show that our multi-banked shared-L1 cache can operate with almost the same frequency as that of related TCDM architecture if the cache controller uses a cache line of 4 words. Results also show that, the area overhead with respect to TCDM is less than 18% for a cluster containing 16 Leon3 processors and 32 cache banks. We also show that the overhead on MIPS/Watt and MIPS/mm2 is from 5% to 30% depending on the size of processor in the cluster for a 16x32 configuration (16 cores and 32 cache/memory banks).
M. R. Kakoee, V. Petrovic, L. Benini (2012). A Multi-banked Shared-L1 Cache Architecture for Tightly Coupled Processor Clusters. NEW YORK : IEEE Press [10.1109/ISSoC.2012.6376362].
A Multi-banked Shared-L1 Cache Architecture for Tightly Coupled Processor Clusters
KAKOEE, MOHAMMAD REZA;BENINI, LUCA
2012
Abstract
A shared-L1 cache architecture is proposed for tightly coupled processor clusters. Sharing an L1 tightly coupled data memory (TCDM) among a significant (up to 16) number of processors is challenging in terms of speed. Sharing L1 cache is even more challenging, since operation is more complex, as it eases programming. The feasibility in terms of performance of shared TCDM was shown in ST Microelectronics platform 2012, but the performance cost of supporting shared L1 cache remains to be proven. In this paper we show that replacing TCDM with a multibanked shared-L1 cache imposes limited speed overhead. Of course, it comes at the cost of area and power. We explore the shared L1 cache architecture in terms of number of processing elements (PEs) and cache banks. Experimental results show that our multi-banked shared-L1 cache can operate with almost the same frequency as that of related TCDM architecture if the cache controller uses a cache line of 4 words. Results also show that, the area overhead with respect to TCDM is less than 18% for a cluster containing 16 Leon3 processors and 32 cache banks. We also show that the overhead on MIPS/Watt and MIPS/mm2 is from 5% to 30% depending on the size of processor in the cluster for a 16x32 configuration (16 cores and 32 cache/memory banks).I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.