L2 memory, serving multiple clusters of tightly coupled processors, is well-suited for 3D integration, given its large required size and tolerance to latency and variations in memory access time. In this paper, we focus on the design of a synthesizable L2 memory IP component, which can be attached to a cluster-based multi-core platform through its NoC ports, and offer high-bandwidth memory access with low average latency. We propose a scalable 3D nonuniform memory access (NUMA) architecture, based on low latency logarithmic interconnects, which allows stacking of multiple memory layers with identical dies, supports multiple outstanding transactions, and achieves high clock frequencies due to its highly pipelined nature. Benchmark simulation results demonstrate that addition of 3D-NUMA to a multi-core NoC can result in an average performance boost of 34%. Physical synthesis results show that 3D-NUMA memory system can operate at 500 MHz in STMicroelec-tronics CMOS-28nm Low Power Technology (bounded by memory cut access time, while its logic components can operate up to 1 GHz), up to 8 layers (4 MB) with a memory density loss of only 16%.

A high-performance multiported L2 memory IP for scalable three-dimensional integration2013 IEEE International 3D Systems Integration Conference (3DIC)

AZARKHISH, ERFAN;LOI, IGOR;BENINI, LUCA
2013

Abstract

L2 memory, serving multiple clusters of tightly coupled processors, is well-suited for 3D integration, given its large required size and tolerance to latency and variations in memory access time. In this paper, we focus on the design of a synthesizable L2 memory IP component, which can be attached to a cluster-based multi-core platform through its NoC ports, and offer high-bandwidth memory access with low average latency. We propose a scalable 3D nonuniform memory access (NUMA) architecture, based on low latency logarithmic interconnects, which allows stacking of multiple memory layers with identical dies, supports multiple outstanding transactions, and achieves high clock frequencies due to its highly pipelined nature. Benchmark simulation results demonstrate that addition of 3D-NUMA to a multi-core NoC can result in an average performance boost of 34%. Physical synthesis results show that 3D-NUMA memory system can operate at 500 MHz in STMicroelec-tronics CMOS-28nm Low Power Technology (bounded by memory cut access time, while its logic components can operate up to 1 GHz), up to 8 layers (4 MB) with a memory density loss of only 16%.
2013
2013 IEEE International 3D Systems Integration Conference (3DIC)
1
8
Erfan Azarkhish;Igor Loi;Luca Benini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/388156
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact