Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as tar- get and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1:6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1:2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2:5 nJ (> 20%) is shown for each byte of processed data. This is the first work which repre-sents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.

Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini (2013). Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ. New York : ACM PRESS [10.1145/2513683.2513688].

Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ

SADRI, MOHAMMADSADEGH;BENINI, LUCA
2013

Abstract

Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as tar- get and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1:6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1:2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2:5 nJ (> 20%) is shown for each byte of processed data. This is the first work which repre-sents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.
2013
Proceedings of the 10th FPGAworld Conference on - FPGAworld '13
1
8
Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini (2013). Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ. New York : ACM PRESS [10.1145/2513683.2513688].
Mohammadsadegh Sadri;Christian Weis;Norbert Wehn;Luca Benini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/307125
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 62
  • ???jsp.display-item.citation.isi??? ND
social impact