Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as tar- get and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1:6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1:2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2:5 nJ (> 20%) is shown for each byte of processed data. This is the first work which repre-sents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.
Mohammadsadegh Sadri, Christian Weis, Norbert Wehn, Luca Benini (2013). Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ. New York : ACM PRESS [10.1145/2513683.2513688].
Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ
SADRI, MOHAMMADSADEGH;BENINI, LUCA
2013
Abstract
Cooperation of CPU and hardware accelerator to accomplish computational intensive tasks, provides significant advantages in run-time speed and energy. Efficient management of data sharing among multiple computational kernels can rapidly turn into a complicated problem. The Accelerator coherency port (ACP) emerges as a possible solution by enabling hardware accelerators to issue coherent accesses to the memory space. In this paper, we quantify the advantages of using ACP over the traditional method of sharing data on the DRAM. We select the Xilinx ZYNQ as tar- get and develop an infrastructure to stress the ACP and high-performance (HP) AXI interfaces of the ZYNQ device. Hardware accelerators on both of HP and ACP AXI interfaces reach full duplex data processing bandwidth of over 1:6 GBytes/s running at 125 MHz on a XC7Z020-1C device. The effect of background DRAM and cache traffic on the performance of accelerators is analyzed. For a sample image filtering task, the cooperative operation of CPU and ACP accelerator (CPU-ACP) gains a speed-up of 1:2X over CPU and HP acceleration (CPU-HP). In terms of energy efficiency, an improvement of 2:5 nJ (> 20%) is shown for each byte of processed data. This is the first work which repre-sents detailed practical comparisons on the speed and energy efficiency of various processor-accelerator memory sharing techniques in a configurable heterogeneous platform.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.