Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters

Cavalcante, Matheus; Wüthrich, Domenic; Perotti, Matteo; Riedel, Samuel; Benini, Luca

doi:10.1145/3508352.3549367

While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector machines include microarchitectural tricks to improve the Instruction Level Parallelism (ILP), which increases their instruction fetch and decode energy overhead. In this paper, we explore for the first time vector processing as an option to build small and efficient PEs for large-scale shared-L1 clusters. We propose Spatz, a compact, modular 32-bit vector processing unit based on the integer embedded subset of the RISC-V Vector Extension version 1.0. A Spatz-based cluster with four Multiply-Accumulate Units (MACUs) needs only 7.9 pJ per 32-bit integer multiply-accumulate operation, 40% less energy than an equivalent cluster built with four Snitch scalar cores. We analyzed Spatz' performance by integrating it within MemPool, a large-scale many-core shared-L1 cluster. The Spatz-based MemPool system achieves up to 285 GOPS when running a 256 x 256 32-bit integer matrix multiplication, 70% more than the equivalent Snitch-based MemPool system. In terms of energy efficiency, the Spatz-based MemPool system achieves up to 266 GOPS/W when running the same kernel, more than twice the energy efficiency of the Snitch-based MemPool system, which reaches 128 GOPS/W. Those results show the viability of lean vector processors as high-performance and energy-efficient PEs for large-scale clusters with tightly-coupled L1 memory.

Cavalcante, M., Wüthrich, D., Perotti, M., Riedel, S., Benini, L. (2022). Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1145/3508352.3549367].

Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters

Cavalcante, Matheus;Wüthrich, Domenic;Perotti, Matteo;Riedel, Samuel;Benini, Luca

2022

Abstract

While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector machines include microarchitectural tricks to improve the Instruction Level Parallelism (ILP), which increases their instruction fetch and decode energy overhead. In this paper, we explore for the first time vector processing as an option to build small and efficient PEs for large-scale shared-L1 clusters. We propose Spatz, a compact, modular 32-bit vector processing unit based on the integer embedded subset of the RISC-V Vector Extension version 1.0. A Spatz-based cluster with four Multiply-Accumulate Units (MACUs) needs only 7.9 pJ per 32-bit integer multiply-accumulate operation, 40% less energy than an equivalent cluster built with four Snitch scalar cores. We analyzed Spatz' performance by integrating it within MemPool, a large-scale many-core shared-L1 cluster. The Spatz-based MemPool system achieves up to 285 GOPS when running a 256 x 256 32-bit integer matrix multiplication, 70% more than the equivalent Snitch-based MemPool system. In terms of energy efficiency, the Spatz-based MemPool system achieves up to 266 GOPS/W when running the same kernel, more than twice the energy efficiency of the Snitch-based MemPool system, which reaches 128 GOPS/W. Those results show the viability of lean vector processors as high-performance and energy-efficient PEs for large-scale clusters with tightly-coupled L1 memory.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo del volume
	
				2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				9
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3508352.3549367
			
	Citazione
	
				Cavalcante, M., Wüthrich, D., Perotti, M., Riedel, S., Benini, L. (2022). Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1145/3508352.3549367].
			
	Tutti gli autori
	
						Cavalcante, Matheus; Wüthrich, Domenic; Perotti, Matteo; Riedel, Samuel; Benini, Luca

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/959401

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

24

21

ND

CRIS Current Research Information System