Heterogeneous architectures based on one fast-clocked, mod- erately multicore "host"processor plus a many-core accelera- tor represent one promising way to satisfy the ever-increasing GOps/W requirements of embedded systems-on-chip. How- ever, heterogeneous computing comes at the cost of increased programming complexity, requiring major rewrite of the ap- plications with low-level programming style (e.g, OpenCL). In this paper we present a programming model, compiler and runtime system for a prototype board from STMicroelec- tronics featuring a ARM9 host and a STHORM many-core accelerator. The programming model is based on OpenMP, with additional directives to efficiently program the acceler- Ator from a single host program. The proposed multi-ISA compilation toolchain hides all the process of outlining an ac- celerator program, compiling and loading it to the STHORM platform and implementing data sharing between the host and the accelerator. Our experimental results show that we achieve very close performance to hand-optimized OpenCL codes, at a significantly lower programming complexity.
Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, Luca Benini (2013). Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP. New York : ACM [10.1145/2489068.2489069].
Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP
MARONGIU, ANDREA;CAPOTONDI, ALESSANDRO;TAGLIAVINI, GIUSEPPE;BENINI, LUCA
2013
Abstract
Heterogeneous architectures based on one fast-clocked, mod- erately multicore "host"processor plus a many-core accelera- tor represent one promising way to satisfy the ever-increasing GOps/W requirements of embedded systems-on-chip. How- ever, heterogeneous computing comes at the cost of increased programming complexity, requiring major rewrite of the ap- plications with low-level programming style (e.g, OpenCL). In this paper we present a programming model, compiler and runtime system for a prototype board from STMicroelec- tronics featuring a ARM9 host and a STHORM many-core accelerator. The programming model is based on OpenMP, with additional directives to efficiently program the acceler- Ator from a single host program. The proposed multi-ISA compilation toolchain hides all the process of outlining an ac- celerator program, compiling and loading it to the STHORM platform and implementing data sharing between the host and the accelerator. Our experimental results show that we achieve very close performance to hand-optimized OpenCL codes, at a significantly lower programming complexity.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.