CRIS Current Research Information System

The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware ar- chitectures that optimize computational e!ciency and energy consump- tion. This paper evaluates the performance of the Tenstorrent Grayskull e75 RISC-V accelerator for basic linear algebra kernels at reduced nu- merical precision, a fundamental operation in LLM computations. We present a detailed characterization of Grayskull’s execution model, grid size, matrix dimensions, data formats, and numerical precision impact on computational e!ciency. Furthermore, we compare Grayskull’s perfor- mance against state-of-the-art architectures with tensor acceleration, in- cluding Intel Sapphire Rapids processors and two NVIDIA GPUs (V100 and A100). Whilst NVIDIA GPUs dominate raw performance, Grayskull demonstrates a competitive trade-o" between power consumption and computational throughput, reaching a peak of 1.55 TFLOPs/Watt with BF16.

Pizzini Cavagna, H., Cesarini, D., Bartolini, A. (2025). Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities. Springer Nature [10.1007/978-3-032-07612-0_10].

Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities

Pizzini Cavagna, Hiari^{Primo

Writing – Original Draft Preparation};Cesarini, Daniele;Bartolini, Andrea^{Ultimo

Writing – Review & Editing}

2025

Abstract

The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware ar- chitectures that optimize computational e!ciency and energy consump- tion. This paper evaluates the performance of the Tenstorrent Grayskull e75 RISC-V accelerator for basic linear algebra kernels at reduced nu- merical precision, a fundamental operation in LLM computations. We present a detailed characterization of Grayskull’s execution model, grid size, matrix dimensions, data formats, and numerical precision impact on computational e!ciency. Furthermore, we compare Grayskull’s perfor- mance against state-of-the-art architectures with tensor acceleration, in- cluding Intel Sapphire Rapids processors and two NVIDIA GPUs (V100 and A100). Whilst NVIDIA GPUs dominate raw performance, Grayskull demonstrates a competitive trade-o" between power consumption and computational throughput, reaching a peak of 1.55 TFLOPs/Watt with BF16.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				High Performance Computing (ISC High Performance 2025)
			
	Pagina iniziale
	
				123
			
	Pagina finale
	
				134
			
	Collana/Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-032-07612-0_10
			
	Citazione
	
				Pizzini Cavagna, H., Cesarini, D., Bartolini, A. (2025). Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities. Springer Nature [10.1007/978-3-032-07612-0_10].
			
	Tutti gli autori
	
						Pizzini Cavagna, Hiari; Cesarini, Daniele; Bartolini, Andrea

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1032111

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

ND

social impact