Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster

Bertuletti, Marco; Riedel, Samuel; Zhang, Yichao; Vanelli-Coralli, Alessandro; Benini, Luca

doi:10.1007/978-3-031-46077-7_16

Synchronization is likely the most critical performance killer in shared-memory parallel programs. With the rise of multi-core and many-core processors, the relative impact on performance and energy overhead of synchronization is bound to grow. This paper focuses on barrier synchronization for TeraPool, a cluster of 1024 RISC-V processors with non-uniform memory access to a tightly coupled 4 MB shared L1 data memory. We compare the synchronization strategies available in other multi-core and many-core clusters to identify the optimal native barrier kernel for TeraPool. We benchmark a set of optimized barrier implementations and evaluate their performance in the framework of the widespread fork-join Open-MP style programming model. We test parallel kernels from the signal-processing and telecommunications domain, achieving less than 10% synchronization overhead over the total runtime for problems that fit TeraPool’s L1 memory. By fine-tuning our tree barriers, we achieve 1.6 speed-up with respect to a naive central counter barrier and just 6.2% overhead on a typical 5G application, including a challenging multistage synchronization kernel. To our knowledge, this is the first work where shared-memory barriers are used for the synchronization of a thousand processing elements tightly coupled to shared data memory.

Bertuletti, M., Riedel, S., Zhang, Y., Vanelli-Coralli, A., Benini, L. (2023). Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster [10.1007/978-3-031-46077-7_16].

Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster

Bertuletti, Marco;Riedel, Samuel;Zhang, Yichao;Vanelli-Coralli, Alessandro;Benini, Luca

2023

Abstract

Synchronization is likely the most critical performance killer in shared-memory parallel programs. With the rise of multi-core and many-core processors, the relative impact on performance and energy overhead of synchronization is bound to grow. This paper focuses on barrier synchronization for TeraPool, a cluster of 1024 RISC-V processors with non-uniform memory access to a tightly coupled 4 MB shared L1 data memory. We compare the synchronization strategies available in other multi-core and many-core clusters to identify the optimal native barrier kernel for TeraPool. We benchmark a set of optimized barrier implementations and evaluate their performance in the framework of the widespread fork-join Open-MP style programming model. We test parallel kernels from the signal-processing and telecommunications domain, achieving less than 10% synchronization overhead over the total runtime for problems that fit TeraPool’s L1 memory. By fine-tuning our tree barriers, we achieve 1.6 speed-up with respect to a naive central counter barrier and just 6.2% overhead on a typical 5G application, including a challenging multistage synchronization kernel. To our knowledge, this is the first work where shared-memory barriers are used for the synchronization of a thousand processing elements tightly coupled to shared data memory.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				SAMOS 2023: Embedded Computer Systems: Architectures, Modeling, and Simulation
			
	Pagina iniziale
	
				241
			
	Pagina finale
	
				254
			
	Collana/Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-46077-7_16
			
	Citazione
	
				Bertuletti, M., Riedel, S., Zhang, Y., Vanelli-Coralli, A., Benini, L. (2023). Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster [10.1007/978-3-031-46077-7_16].
			
	Tutti gli autori
	
						Bertuletti, Marco; Riedel, Samuel; Zhang, Yichao; Vanelli-Coralli, Alessandro; Benini, Luca

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/959278

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

CRIS Current Research Information System