Systolic arrays and shared-memory manycore clusters are two widely used architectural templates that offer vastly different trade-offs. Systolic arrays achieve exceptional performance for workloads with regular dataflow at the cost of a rigid architecture and programming model. Shared-memory manycore systems are more flexible and easy to program, but data must be moved explicitly to/from cores. This work combines the best of both worlds by adding a systolic overlay to a general-purpose shared-memory manycore cluster allowing for efficient systolic execution while maintaining flexibility. We propose and implement two instruction set architecture extensions enabling native and automatic communication between cores through shared memory. Our hybrid approach allows configuring different systolic topologies at execution time and running hybrid systolic-shared-memory computations. The hybrid architecture's convolution kernel outperforms the optimized shared-memory one by 18%.

Riedel, S., Khov, G.H., Mazzola, S., Cavalcante, M., Andri, R., Benini, L. (2023). MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster [10.23919/DATE56975.2023.10136909].

MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster

Benini, Luca
2023

Abstract

Systolic arrays and shared-memory manycore clusters are two widely used architectural templates that offer vastly different trade-offs. Systolic arrays achieve exceptional performance for workloads with regular dataflow at the cost of a rigid architecture and programming model. Shared-memory manycore systems are more flexible and easy to program, but data must be moved explicitly to/from cores. This work combines the best of both worlds by adding a systolic overlay to a general-purpose shared-memory manycore cluster allowing for efficient systolic execution while maintaining flexibility. We propose and implement two instruction set architecture extensions enabling native and automatic communication between cores through shared memory. Our hybrid approach allows configuring different systolic topologies at execution time and running hybrid systolic-shared-memory computations. The hybrid architecture's convolution kernel outperforms the optimized shared-memory one by 18%.
2023
2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)
.
.
Riedel, S., Khov, G.H., Mazzola, S., Cavalcante, M., Andri, R., Benini, L. (2023). MemPool Meets Systolic: Flexible Systolic Computation in a Large Shared-Memory Processor Cluster [10.23919/DATE56975.2023.10136909].
Riedel, Samuel; Khov, Gua Hao; Mazzola, Sergio; Cavalcante, Matheus; Andri, Renzo; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/958545
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact