CRIS Current Research Information System

Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need for parallel programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering flexible support to fine-grained and irregular parallelism. However, efficiently supporting this programming paradigm on resource-constrained parallel accelerators is a challenging task. In this paper, we present an optimized implementation of the OpenMP tasking model for embedded parallel accelerators, discussing the key design solution that guarantee small memory (footprint) and minimize performance overheads. We validate our design by comparing to several state-of-the-art tasking implementations, using the most representative parallelization patterns. The experimental results confirm that our solution achieves near-ideal speedups for tasks as small as 5K cycles.

Cesarini, D., Marongiu, A., Benini, L. (2016). An optimized task-based runtime system for resource-constrained parallel accelerators. Institute of Electrical and Electronics Engineers Inc..

An optimized task-based runtime system for resource-constrained parallel accelerators

Cesarini, Daniele;Marongiu, Andrea;Benini, Luca

2016

Abstract

Manycore accelerators have recently proven a promising solution for increasingly powerful and energy efficient computing systems. This raises the need for parallel programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering flexible support to fine-grained and irregular parallelism. However, efficiently supporting this programming paradigm on resource-constrained parallel accelerators is a challenging task. In this paper, we present an optimized implementation of the OpenMP tasking model for embedded parallel accelerators, discussing the key design solution that guarantee small memory (footprint) and minimize performance overheads. We validate our design by comparing to several state-of-the-art tasking implementations, using the most representative parallelization patterns. The experimental results confirm that our solution achieves near-ideal speedups for tasks as small as 5K cycles.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Titolo del volume
	
				Proceedings of the 2016 Design, Automation and Test in Europe Conference and Exhibition, DATE 2016
			
	Pagina iniziale
	
				1261
			
	Pagina finale
	
				1266
			
	Citazione
	
				Cesarini, D., Marongiu, A., Benini, L. (2016). An optimized task-based runtime system for resource-constrained parallel accelerators. Institute of Electrical and Electronics Engineers Inc..
			
	Tutti gli autori
	
						Cesarini, Daniele; Marongiu, Andrea; Benini, Luca
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/613664

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

1

ND

social impact