The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware ar- chitectures that optimize computational e!ciency and energy consump- tion. This paper evaluates the performance of the Tenstorrent Grayskull e75 RISC-V accelerator for basic linear algebra kernels at reduced nu- merical precision, a fundamental operation in LLM computations. We present a detailed characterization of Grayskull’s execution model, grid size, matrix dimensions, data formats, and numerical precision impact on computational e!ciency. Furthermore, we compare Grayskull’s perfor- mance against state-of-the-art architectures with tensor acceleration, in- cluding Intel Sapphire Rapids processors and two NVIDIA GPUs (V100 and A100). Whilst NVIDIA GPUs dominate raw performance, Grayskull demonstrates a competitive trade-o" between power consumption and computational throughput, reaching a peak of 1.55 TFLOPs/Watt with BF16.

Pizzini Cavagna, H., Cesarini, D., Bartolini, A. (2025). Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities. Springer Nature [10.1007/978-3-032-07612-0_10].

Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities

Pizzini Cavagna, Hiari
Primo
Writing – Original Draft Preparation
;
Bartolini, Andrea
Ultimo
Writing – Review & Editing
2025

Abstract

The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware ar- chitectures that optimize computational e!ciency and energy consump- tion. This paper evaluates the performance of the Tenstorrent Grayskull e75 RISC-V accelerator for basic linear algebra kernels at reduced nu- merical precision, a fundamental operation in LLM computations. We present a detailed characterization of Grayskull’s execution model, grid size, matrix dimensions, data formats, and numerical precision impact on computational e!ciency. Furthermore, we compare Grayskull’s perfor- mance against state-of-the-art architectures with tensor acceleration, in- cluding Intel Sapphire Rapids processors and two NVIDIA GPUs (V100 and A100). Whilst NVIDIA GPUs dominate raw performance, Grayskull demonstrates a competitive trade-o" between power consumption and computational throughput, reaching a peak of 1.55 TFLOPs/Watt with BF16.
2025
High Performance Computing (ISC High Performance 2025)
123
134
Pizzini Cavagna, H., Cesarini, D., Bartolini, A. (2025). Assessing Tenstorrent’s RISC-V MatMul Acceleration Capabilities. Springer Nature [10.1007/978-3-032-07612-0_10].
Pizzini Cavagna, Hiari; Cesarini, Daniele; Bartolini, Andrea
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1032111
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact