Reconfigurable heterogeneous systems-on-chips (SoCs) integrating multiple accelerators are cost-effective and feature the processing power required for complex embedded applications. However, to enable their usage in real-time settings, it is crucial to control interference on the shared main memory for reliable performance. Interference causes performance degradation due to simultaneous memory requests by components such as CPUs, caches, accelerators, and DMAs. We propose a methodology to characterize the interference to multicore host processors caused by accelerators implemented in the FPGA fabric of reconfigurable heterogeneous SoCs. Based on it, we extend the roofline model to account for performance degradation of the computing platform. The extended model allows to determine in an efficient way at which point memory interference becomes critical for a given platform and workload. We apply our methodology to a modern Xilinx UltraScale+ SoC integrating a multicore ARM Cortex-A CPU and a Kintex-grade FPGA. To the best of our knowledge, our results experimentally show for the first time that programs with intensities below 5 flop/byte - workloads with low cache locality - can suffer from slowdowns of up to an order of magnitude.

Analyzing Memory Interference of FPGA Accelerators on Multicore Hosts in Heterogeneous Reconfigurable SoCs

Benini L.
2021

Abstract

Reconfigurable heterogeneous systems-on-chips (SoCs) integrating multiple accelerators are cost-effective and feature the processing power required for complex embedded applications. However, to enable their usage in real-time settings, it is crucial to control interference on the shared main memory for reliable performance. Interference causes performance degradation due to simultaneous memory requests by components such as CPUs, caches, accelerators, and DMAs. We propose a methodology to characterize the interference to multicore host processors caused by accelerators implemented in the FPGA fabric of reconfigurable heterogeneous SoCs. Based on it, we extend the roofline model to account for performance degradation of the computing platform. The extended model allows to determine in an efficient way at which point memory interference becomes critical for a given platform and workload. We apply our methodology to a modern Xilinx UltraScale+ SoC integrating a multicore ARM Cortex-A CPU and a Kintex-grade FPGA. To the best of our knowledge, our results experimentally show for the first time that programs with intensities below 5 flop/byte - workloads with low cache locality - can suffer from slowdowns of up to an order of magnitude.
Proceedings -Design, Automation and Test in Europe, DATE
1152
1155
PROCEEDINGS - DESIGN, AUTOMATION, AND TEST IN EUROPE CONFERENCE AND EXHIBITION
Mattheeuws M.; Forsberg B.; Kurth A.; Benini L.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/870400
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact