With the introduction of more powerful and massively parallel embedded processors, embedded systems are becoming HPC-capable. Heterogeneous on-chip systems (SoC) that couple a general-purposehost processor to a many-core accelerator are becoming more and more widespread, and provide tremendous peak performance/watt, well suited to execute HPC-class programs. The increased computation potential is however traded off for ease programming. Application developers are indeed required to manually deal with outlining code parts suitable for acceleration, parallelize them efficiently over many available cores, and orchestrate data transfers to/from the accelerator. In addition, since most many-cores are organized as a collection ofclusters, featuring fast local communication but slow remote communication (i.e., to another cluster's local memory), the programmer should also take care of properly mapping the parallel computation so as to avoid poor data locality. OpenMP v4.0 introduces new constructs for computation offloading, as well as directives to deploy parallel computation in a cluster-aware manner. In this paper we assess the effectiveness of OpenMP v4.0 at exploiting the massive parallelism available in embedded heterogeneous SoCs, comparing to standard parallel loops over several computation-intensive applications from the linear algebra and image processing domains.

Capotondi, A., Marongiu, A. (2016). On the effectiveness of OpenMP teams for cluster-based many-core accelerators. Institute of Electrical and Electronics Engineers Inc. [10.1109/HPCSim.2016.7568399].

On the effectiveness of OpenMP teams for cluster-based many-core accelerators

CAPOTONDI, ALESSANDRO;MARONGIU, ANDREA
2016

Abstract

With the introduction of more powerful and massively parallel embedded processors, embedded systems are becoming HPC-capable. Heterogeneous on-chip systems (SoC) that couple a general-purposehost processor to a many-core accelerator are becoming more and more widespread, and provide tremendous peak performance/watt, well suited to execute HPC-class programs. The increased computation potential is however traded off for ease programming. Application developers are indeed required to manually deal with outlining code parts suitable for acceleration, parallelize them efficiently over many available cores, and orchestrate data transfers to/from the accelerator. In addition, since most many-cores are organized as a collection ofclusters, featuring fast local communication but slow remote communication (i.e., to another cluster's local memory), the programmer should also take care of properly mapping the parallel computation so as to avoid poor data locality. OpenMP v4.0 introduces new constructs for computation offloading, as well as directives to deploy parallel computation in a cluster-aware manner. In this paper we assess the effectiveness of OpenMP v4.0 at exploiting the massive parallelism available in embedded heterogeneous SoCs, comparing to standard parallel loops over several computation-intensive applications from the linear algebra and image processing domains.
2016
2016 International Conference on High Performance Computing and Simulation, HPCS 2016
667
674
Capotondi, A., Marongiu, A. (2016). On the effectiveness of OpenMP teams for cluster-based many-core accelerators. Institute of Electrical and Electronics Engineers Inc. [10.1109/HPCSim.2016.7568399].
Capotondi, Alessandro; Marongiu, Andrea
File in questo prodotto:
File Dimensione Formato  
capotondi_HPCS16.pdf

accesso aperto

Descrizione: Postprint paper
Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 341.2 kB
Formato Adobe PDF
341.2 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/575144
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact