Cluster-based architectures are increasingly being adopted to design embedded many-cores. These platforms can deliver very high peak performance within a contained power envelope, provided that programmers can make effective use the available parallel cores. This is becoming an extremely difficult task, as embedded applications are growing in complexity and exhibit irregular and dynamic parallelism. The OpenMP tasking extensions represent a powerful abstraction to capture this form of parallelism. However, efficiently supporting it on cluster-based embedded SoCs is not easy, because the fine-grained parallel workload present in embedded applications can not tolerate high memory and run-time overheads. In this paper we present our design of the runtime support layer to OpenMP tasking for an embedded shared memory cluster, identifying key aspects to achieving performance and discussing important architectural support to removing major bottlenecks.

Enabling Fine-Grained OpenMP Tasking on Tightly-Coupled Shared Memory Clusters / Paolo Burgio;Giuseppe Tagliavini;Andrea Marongiu;Luca Benini. - STAMPA. - (2013), pp. 1504-1509. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013 tenutosi a Grenoble, France nel 18-22 March 2013) [10.7873/DATE.2013.306].

Enabling Fine-Grained OpenMP Tasking on Tightly-Coupled Shared Memory Clusters

BURGIO, PAOLO;TAGLIAVINI, GIUSEPPE;MARONGIU, ANDREA;BENINI, LUCA
2013

Abstract

Cluster-based architectures are increasingly being adopted to design embedded many-cores. These platforms can deliver very high peak performance within a contained power envelope, provided that programmers can make effective use the available parallel cores. This is becoming an extremely difficult task, as embedded applications are growing in complexity and exhibit irregular and dynamic parallelism. The OpenMP tasking extensions represent a powerful abstraction to capture this form of parallelism. However, efficiently supporting it on cluster-based embedded SoCs is not easy, because the fine-grained parallel workload present in embedded applications can not tolerate high memory and run-time overheads. In this paper we present our design of the runtime support layer to OpenMP tasking for an embedded shared memory cluster, identifying key aspects to achieving performance and discussing important architectural support to removing major bottlenecks.
2013
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013
1504
1509
Enabling Fine-Grained OpenMP Tasking on Tightly-Coupled Shared Memory Clusters / Paolo Burgio;Giuseppe Tagliavini;Andrea Marongiu;Luca Benini. - STAMPA. - (2013), pp. 1504-1509. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013 tenutosi a Grenoble, France nel 18-22 March 2013) [10.7873/DATE.2013.306].
Paolo Burgio;Giuseppe Tagliavini;Andrea Marongiu;Luca Benini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/303735
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 11
social impact