Near Threshold Operation is today a key research area in ultra-low power (ULP) computing, as it promises 10x improvement in energy efficiency compared to super-threshold operation, and it mitigates thermal bottlenecks. Unfortunately near-threshold operation is plagued by greatly increased sensitivity to threshold voltage variations, such as those caused by ambient temperature fluctuation. In this paper we focus on tightly-coupled ULP processor cluster architecture where a low latency, high-bandwidth processor-to-L1-memory interconnection network plays a key role. We propose a lightweight runtime solution to tolerate ambient temperature induced variations by dynamically adapting the processor-to-L1-memory latency without compromising execution correctness. We extensively tested our solution in different scenarios and we evaluate the different design trade-offs, showing the cost, performance reliability gain compared to state-of-the-art static solutions. Our solution is able to reach a performance gain up to 25% in a typical use case scenario with a very low (≈ 4%) area overhead.
Daniele Bortolotti, Andrea Bartolini, Luca Benini (2013). An Ambient Temperature Variation Tolerance Scheme for an Ultra Low Power Shared-L1 Processor Cluster. 2013 IEEE [10.1109/DSD.2013.74].
An Ambient Temperature Variation Tolerance Scheme for an Ultra Low Power Shared-L1 Processor Cluster
BORTOLOTTI, DANIELE;BARTOLINI, ANDREA;BENINI, LUCA
2013
Abstract
Near Threshold Operation is today a key research area in ultra-low power (ULP) computing, as it promises 10x improvement in energy efficiency compared to super-threshold operation, and it mitigates thermal bottlenecks. Unfortunately near-threshold operation is plagued by greatly increased sensitivity to threshold voltage variations, such as those caused by ambient temperature fluctuation. In this paper we focus on tightly-coupled ULP processor cluster architecture where a low latency, high-bandwidth processor-to-L1-memory interconnection network plays a key role. We propose a lightweight runtime solution to tolerate ambient temperature induced variations by dynamically adapting the processor-to-L1-memory latency without compromising execution correctness. We extensively tested our solution in different scenarios and we evaluate the different design trade-offs, showing the cost, performance reliability gain compared to state-of-the-art static solutions. Our solution is able to reach a performance gain up to 25% in a typical use case scenario with a very low (≈ 4%) area overhead.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.