In this brief, we propose a variation-tolerant architecture for shared-L1 processor clusters working at near-threshold (NT). Our variation-tolerant technique is able to compensate the effect of delay variations, which are exacerbated by moving to the NT region, on the processor to memory communication by adding one or two stages of controllable pipelines. Moreover, we propose a reconfigurable address-interleaving technique, which enables us to shut down some of the memory blocks if they are either too slow due to the variation or not needed by the application (to reduce power consumption). Experimental results show that our speed adaptation approach is able to compensate up to 90% degradation in the request path with less than 2% hardware overhead for a shared-L1 cluster with 16 processors and 32 memory banks. The configurable interleaving technique has an overhead of 10% on the request timing path of a 16 × 32 interconnection network.
Mohammad Reza Kakoee, Igor Loi, Luca Benini (2012). Variation-Tolerant Architecture for Ultra Low Power Shared-L1 Processor Clusters. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS. II, EXPRESS BRIEFS, 59(12), 927-931 [10.1109/TCSII.2012.2231039].
Variation-Tolerant Architecture for Ultra Low Power Shared-L1 Processor Clusters
KAKOEE, MOHAMMAD REZA;LOI, IGOR;BENINI, LUCA
2012
Abstract
In this brief, we propose a variation-tolerant architecture for shared-L1 processor clusters working at near-threshold (NT). Our variation-tolerant technique is able to compensate the effect of delay variations, which are exacerbated by moving to the NT region, on the processor to memory communication by adding one or two stages of controllable pipelines. Moreover, we propose a reconfigurable address-interleaving technique, which enables us to shut down some of the memory blocks if they are either too slow due to the variation or not needed by the application (to reduce power consumption). Experimental results show that our speed adaptation approach is able to compensate up to 90% degradation in the request path with less than 2% hardware overhead for a shared-L1 cluster with 16 processors and 32 memory banks. The configurable interleaving technique has an overhead of 10% on the request timing path of a 16 × 32 interconnection network.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.