Heterogeneous multi-core architectures combine a few 'host' cores, optimized for single-thread performance, with many small energy-efficient 'accelerator' cores for data-parallel processing, on a single chip. Offloading a computation to the many-core acceleration fabric introduces a communication and synchronization cost which reduces the speedup attainable on the accelerator, particularly for small and fine-grained parallel tasks. We demonstrate that by co-designing the hardware and offload routines, we can increase the speedup of an offloaded DAXPY kernel by as much as 47.9%. Furthermore, we show that it is possible to accurately model the runtime of an offloaded application, accounting for the offload overheads, with as low as 1% MAPE error, enabling optimal offload decisions under offload execution time constraints.
Colagrande, L., Benini, L. (2024). Optimizing Offload Performance in Heterogeneous MPSoCs. Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE58400.2024.10546670].
Optimizing Offload Performance in Heterogeneous MPSoCs
Benini L.
2024
Abstract
Heterogeneous multi-core architectures combine a few 'host' cores, optimized for single-thread performance, with many small energy-efficient 'accelerator' cores for data-parallel processing, on a single chip. Offloading a computation to the many-core acceleration fabric introduces a communication and synchronization cost which reduces the speedup attainable on the accelerator, particularly for small and fine-grained parallel tasks. We demonstrate that by co-designing the hardware and offload routines, we can increase the speedup of an offloaded DAXPY kernel by as much as 47.9%. Furthermore, we show that it is possible to accurately model the runtime of an offloaded application, accounting for the offload overheads, with as low as 1% MAPE error, enabling optimal offload decisions under offload execution time constraints.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.