MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster

Riedel, Samuel; Khov, Gua Hao; Mazzola, Sergio; Cavalcante, Matheus; Andri, Renzo; Benini, Luca

doi:10.23919/DATE56975.2023.10136909

Systolic arrays and shared-memory manycore clusters are two widely used architectural templates that offer vastly different trade-offs. Systolic arrays achieve exceptional performance for workloads with regular dataflow at the cost of a rigid architecture and programming model. Shared-memory manycore systems are more flexible and easy to program, but data must be moved explicitly to/from cores. This work combines the best of both worlds by adding a systolic overlay to a general-purpose shared-memory manycore cluster allowing for efficient systolic execution while maintaining flexibility. We propose and implement two instruction set architecture extensions enabling native and automatic communication between cores through shared memory. Our hybrid approach allows configuring different systolic topologies at execution time and running hybrid systolic-shared-memory computations. The hybrid architecture's convolution kernel outperforms the optimized shared-memory one by 18%.

Riedel, S., Khov, G.H., Mazzola, S., Cavalcante, M., Andri, R., Benini, L. (2023). MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster [10.23919/DATE56975.2023.10136909].

MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster

Riedel, Samuel;Khov, Gua Hao;Mazzola, Sergio;Cavalcante, Matheus;Andri, Renzo;Benini, Luca

2023

Abstract

Systolic arrays and shared-memory manycore clusters are two widely used architectural templates that offer vastly different trade-offs. Systolic arrays achieve exceptional performance for workloads with regular dataflow at the cost of a rigid architecture and programming model. Shared-memory manycore systems are more flexible and easy to program, but data must be moved explicitly to/from cores. This work combines the best of both worlds by adding a systolic overlay to a general-purpose shared-memory manycore cluster allowing for efficient systolic execution while maintaining flexibility. We propose and implement two instruction set architecture extensions enabling native and automatic communication between cores through shared memory. Our hybrid approach allows configuring different systolic topologies at execution time and running hybrid systolic-shared-memory computations. The hybrid architecture's convolution kernel outperforms the optimized shared-memory one by 18%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)
			
	Pagina iniziale
	
				.
			
	Pagina finale
	
				.
			
	Codice DOI
	
				https://dx.doi.org/10.23919/DATE56975.2023.10136909
			
	Citazione
	
				Riedel, S., Khov, G.H., Mazzola, S., Cavalcante, M., Andri, R., Benini, L. (2023). MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster [10.23919/DATE56975.2023.10136909].
			
	Tutti gli autori
	
						Riedel, Samuel; Khov, Gua Hao; Mazzola, Sergio; Cavalcante, Matheus; Andri, Renzo; Benini, Luca

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/958545

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

CRIS Current Research Information System