Many applications require both high performance and predictable timing. High-performance can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these systems share the memory bandwidth they are susceptible to interference from each other, which is a problem for timing predictability. We achieve predictability on multi-cores by employing the predictable execution model (PREM), which splits execution into a sequence of memory and compute phases, and schedules these such that only a single core is executing a memory phase at a time. We present a toolchain consisting of a compiler and an Integer Linear Programming scheduling model. Our compiler uses loop analysis and tiling to transform application code into PREM compliant binaries. Furthermore, we solve the problem of scheduling execution on multiple cores while preventing interference of memory phases. We evaluate our toolchain on Advanced-Driver-Assistance-Systems-like scenario containing matrix multiplications and FFT computations on NVIDIA TX1. The results show that our approach maintains similar average performance and improves variance of completion times by a factor of 9.
Matějka, J., Hanzálek, Z., Forsberg, B., Benini, L., Sojka, M., Marongiu, A. (2018). Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution. Association for Computing Machinery, Inc [10.1145/3178442.3178444].
Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution
Benini, Luca;Marongiu, Andrea
2018
Abstract
Many applications require both high performance and predictable timing. High-performance can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these systems share the memory bandwidth they are susceptible to interference from each other, which is a problem for timing predictability. We achieve predictability on multi-cores by employing the predictable execution model (PREM), which splits execution into a sequence of memory and compute phases, and schedules these such that only a single core is executing a memory phase at a time. We present a toolchain consisting of a compiler and an Integer Linear Programming scheduling model. Our compiler uses loop analysis and tiling to transform application code into PREM compliant binaries. Furthermore, we solve the problem of scheduling execution on multiple cores while preventing interference of memory phases. We evaluate our toolchain on Advanced-Driver-Assistance-Systems-like scenario containing matrix multiplications and FFT computations on NVIDIA TX1. The results show that our approach maintains similar average performance and improves variance of completion times by a factor of 9.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.