Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

Tagliavini, Giuseppe; Haugou, Germain; Marongiu, Andrea; Benini, Luca

doi:10.1007/s11554-015-0544-0

In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural constraints impose hard limits on the main memory bandwidth, and push for software techniques which optimize the memory usage of complex multi-kernel applications. In this work, we propose a set of techniques, mainly based on graph analysis and image tiling, targeted to accelerate the execution of image processing applications expressed as standard OpenVX graphs on cluster-based many-core accelerators. We have developed a run-time framework which implements these techniques using a front-end compliant to the OpenVX standard, and based on an OpenCL extension that enables more explicit control and efficient reuse of on-chip memory and greatly reduces the recourse to off-chip memory for storing intermediate results. Experiments performed on the STHORM many-core accelerator demonstrate that our approach leads to massive reduction of time and bandwidth, even when the main memory bandwidth for the accelerator is severely constrained.

Tagliavini, G., Haugou, G., Marongiu, A., Benini, L. (2018). Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators. JOURNAL OF REAL-TIME IMAGE PROCESSING, 15(1), 73-92 [10.1007/s11554-015-0544-0].

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators

TAGLIAVINI, GIUSEPPE;Haugou, Germain;MARONGIU, ANDREA;BENINI, LUCA

2018

Abstract

In recent years, image processing has been a key application area for mobile and embedded computing platforms. In this context, many-core accelerators are a viable solution to efficiently execute highly parallel kernels. However, architectural constraints impose hard limits on the main memory bandwidth, and push for software techniques which optimize the memory usage of complex multi-kernel applications. In this work, we propose a set of techniques, mainly based on graph analysis and image tiling, targeted to accelerate the execution of image processing applications expressed as standard OpenVX graphs on cluster-based many-core accelerators. We have developed a run-time framework which implements these techniques using a front-end compliant to the OpenVX standard, and based on an OpenCL extension that enables more explicit control and efficient reuse of on-chip memory and greatly reduces the recourse to off-chip memory for storing intermediate results. Experiments performed on the STHORM many-core accelerator demonstrate that our approach leads to massive reduction of time and bandwidth, even when the main memory bandwidth for the accelerator is severely constrained.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Rivista
	
				JOURNAL OF REAL-TIME IMAGE PROCESSING
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11554-015-0544-0
			
	Citazione
	
				Tagliavini, G., Haugou, G., Marongiu, A., Benini, L. (2018). Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators. JOURNAL OF REAL-TIME IMAGE PROCESSING, 15(1), 73-92 [10.1007/s11554-015-0544-0].
			
	Tutti gli autori
	
						Tagliavini, Giuseppe; Haugou, Germain; Marongiu, Andrea; Benini, Luca
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/589164

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

12

8

ND

CRIS Current Research Information System