Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used – with a crossbarlike medium inside each cluster and a network-on-chip (NoC) at the global level – which make memory operations nonuniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores / A. Marongiu; P. Burgio; L. Benini. - STAMPA. - (2012), pp. 105-110. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012 tenutosi a Dresden nel 12-16 March 2012) [10.1109/DATE.2012.6176441].

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores

MARONGIU, ANDREA;BURGIO, PAOLO;BENINI, LUCA
2012

Abstract

Several recent many-core accelerators have been architected as fabrics of tightly-coupled shared memory clusters. A hierarchical interconnection system is used – with a crossbarlike medium inside each cluster and a network-on-chip (NoC) at the global level – which make memory operations nonuniform (NUMA). Nested parallelism represents a powerful programming abstraction for these architectures, where a first level of parallelism can be used to distribute coarse-grained tasks to clusters, and additional levels of fine-grained parallelism can be distributed to processors within a cluster. This paper presents a lightweight and highly optimized support for nested parallelism on cluster-based embedded many-cores. We assess the costs to enable multi-level parallelization and demonstrate that our techniques allow to extract high degrees of parallelism.
2012
Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012
105
110
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores / A. Marongiu; P. Burgio; L. Benini. - STAMPA. - (2012), pp. 105-110. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012 tenutosi a Dresden nel 12-16 March 2012) [10.1109/DATE.2012.6176441].
A. Marongiu; P. Burgio; L. Benini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/126270
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 12
social impact