A reliable and variation-tolerant architecture for shared-L1 processor clusters is proposed. The architecture uses a single-cycle mesh of tree as the interconnection network between processors and a unified Tightly Coupled Data Memory (TCDM). The proposed technique is able to compensate the effect of process variation on processor to memory paths. By adding one stage of controllable pipeline on the processor to memory paths we are able to switch between two modes: with and without pipeline. If there is no variation, the processor to memory path is fully combination and we have single-cycle read and write operations. If the variation occurs, the controllable pipeline is switched to pipeline mode and by increasing the latency of the read/write operation we mitigate the effect of the variations. We also propose a configuration-time approach to conditionally add the extra pipeline state based on detection of timing-critical paths. Experimental results show that our speed adaptation approach is able to compensate up-to 90% degradation in the request path with less than 1% hardware overhead for a shared-L1 CMP with 16 processors and 32 memory banks. We show that even if variation occurs on all processor to memory paths, our approach can mitigate it with an average overhead of 20% on the application’s runtime
Kakoee M.R., Loi I. , Benini L. (2012). A resilient architecture for low latency communication in shared-L1 processor clusters. NEW YORK : IEEE Press [10.1109/DATE.2012.6176623].
A resilient architecture for low latency communication in shared-L1 processor clusters
KAKOEE, MOHAMMAD REZA;LOI, IGOR;BENINI, LUCA
2012
Abstract
A reliable and variation-tolerant architecture for shared-L1 processor clusters is proposed. The architecture uses a single-cycle mesh of tree as the interconnection network between processors and a unified Tightly Coupled Data Memory (TCDM). The proposed technique is able to compensate the effect of process variation on processor to memory paths. By adding one stage of controllable pipeline on the processor to memory paths we are able to switch between two modes: with and without pipeline. If there is no variation, the processor to memory path is fully combination and we have single-cycle read and write operations. If the variation occurs, the controllable pipeline is switched to pipeline mode and by increasing the latency of the read/write operation we mitigate the effect of the variations. We also propose a configuration-time approach to conditionally add the extra pipeline state based on detection of timing-critical paths. Experimental results show that our speed adaptation approach is able to compensate up-to 90% degradation in the request path with less than 1% hardware overhead for a shared-L1 CMP with 16 processors and 32 memory banks. We show that even if variation occurs on all processor to memory paths, our approach can mitigate it with an average overhead of 20% on the application’s runtimeI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.