In this article, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX fully depleted silicon-on-insulator (FD-SOI) technology. Ara's microarchitecture is scalable, as it is composed of a set of identical lanes, each containing part of the processor's vector register file and functional units. It achieves up to 97% floating-point unit (FPU) utilization when running a 256×256 double-precision matrix multiplication on 16 lanes. Ara runs at more than 1 GHz in the typical corner (TT/0.80 V/25 °C), achieving a performance up to 33 DP-GFLOPS. In terms of energy efficiency, Ara achieves up to 41 DP-GFLOPS W-1 under the same conditions, which is slightly superior to similar vector processors found in the literature. An analysis on several vectorizable linear algebra computation kernels for a range of different matrix and vector sizes gives insight into performance limitations and bottlenecks for vector processors and outlines directions to maintain high energy efficiency even for small matrix sizes where the vector architecture achieves suboptimal utilization of the available FPUs.

Cavalcante M., Schuiki F., Zaruba F., Schaffner M., Benini L. (2020). Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multiprecision Floating-Point Support in 22-nm FD-SOI. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 28(2), 530-543 [10.1109/TVLSI.2019.2950087].

Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multiprecision Floating-Point Support in 22-nm FD-SOI

Benini L.
2020

Abstract

In this article, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX fully depleted silicon-on-insulator (FD-SOI) technology. Ara's microarchitecture is scalable, as it is composed of a set of identical lanes, each containing part of the processor's vector register file and functional units. It achieves up to 97% floating-point unit (FPU) utilization when running a 256×256 double-precision matrix multiplication on 16 lanes. Ara runs at more than 1 GHz in the typical corner (TT/0.80 V/25 °C), achieving a performance up to 33 DP-GFLOPS. In terms of energy efficiency, Ara achieves up to 41 DP-GFLOPS W-1 under the same conditions, which is slightly superior to similar vector processors found in the literature. An analysis on several vectorizable linear algebra computation kernels for a range of different matrix and vector sizes gives insight into performance limitations and bottlenecks for vector processors and outlines directions to maintain high energy efficiency even for small matrix sizes where the vector architecture achieves suboptimal utilization of the available FPUs.
2020
Cavalcante M., Schuiki F., Zaruba F., Schaffner M., Benini L. (2020). Ara: A 1-GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multiprecision Floating-Point Support in 22-nm FD-SOI. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 28(2), 530-543 [10.1109/TVLSI.2019.2950087].
Cavalcante M.; Schuiki F.; Zaruba F.; Schaffner M.; Benini L.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/791957
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 69
  • ???jsp.display-item.citation.isi??? 51
social impact