Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Uniied Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4Ã faster execution compared to traditional copy-based oload mechanisms.
Capotondi, A., Marongiu, A. (2017). Enabling zero-copy OpenMP ofloading on the PULP many-core accelerator. Association for Computing Machinery, Inc [10.1145/3078659.3079071].
Enabling zero-copy OpenMP ofloading on the PULP many-core accelerator
Capotondi, Alessandro;Marongiu, Andrea
2017
Abstract
Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Uniied Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4Ã faster execution compared to traditional copy-based oload mechanisms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.