We present HeartStream, a 64-RV-core shared-L1-memory cluster (410 GFLOP/s peak performance and 204.8 GBps L1 bandwidth) for energy-efficient AI-enhanced O-RAN. The cores and cluster architecture are customized for baseband processing, supporting complex (16-bit real&imaginary) instructions: multiply&accumulate, division&square-root, SIMD instructions, and hardware-managed systolic queues, improving up to 1.89 × the energy efficiency of key baseband kernels. At 800 [email protected] V, HeartStream delivers up to 243 GFLOP/s on complex-valued wireless workloads. Furthermore, the cores also support efficient AI processing on received data at up to 72 GOP/s. HeartStream is fully compatible with base station power and processing latency limits: it achieves leading-edge software-defined PUSCH efficiency (49.6 GFLOP/s/W) and consumes just 0.68 W(645 [email protected] V, within the 4 ms end-to-end constraint for B5G/6G uplink.

Zhang, Y., Bertuletti, M., Mazzola, S., Riedel, S., Benini, L. (2025). A 410 GFLOP/s, 64 RISC-V Cores, 204.8 GBps Shared-Memory Cluster in 12 nm FinFET with Systolic Execution Support for Efficient B5G/6G AI-Enhanced O-RAN. IEEE Computer Society [10.1109/esserc66193.2025.11214075].

A 410 GFLOP/s, 64 RISC-V Cores, 204.8 GBps Shared-Memory Cluster in 12 nm FinFET with Systolic Execution Support for Efficient B5G/6G AI-Enhanced O-RAN

Benini, Luca
2025

Abstract

We present HeartStream, a 64-RV-core shared-L1-memory cluster (410 GFLOP/s peak performance and 204.8 GBps L1 bandwidth) for energy-efficient AI-enhanced O-RAN. The cores and cluster architecture are customized for baseband processing, supporting complex (16-bit real&imaginary) instructions: multiply&accumulate, division&square-root, SIMD instructions, and hardware-managed systolic queues, improving up to 1.89 × the energy efficiency of key baseband kernels. At 800 [email protected] V, HeartStream delivers up to 243 GFLOP/s on complex-valued wireless workloads. Furthermore, the cores also support efficient AI processing on received data at up to 72 GOP/s. HeartStream is fully compatible with base station power and processing latency limits: it achieves leading-edge software-defined PUSCH efficiency (49.6 GFLOP/s/W) and consumes just 0.68 W(645 [email protected] V, within the 4 ms end-to-end constraint for B5G/6G uplink.
2025
European Solid-State Circuits Conference
401
404
Zhang, Y., Bertuletti, M., Mazzola, S., Riedel, S., Benini, L. (2025). A 410 GFLOP/s, 64 RISC-V Cores, 204.8 GBps Shared-Memory Cluster in 12 nm FinFET with Systolic Execution Support for Efficient B5G/6G AI-Enhanced O-RAN. IEEE Computer Society [10.1109/esserc66193.2025.11214075].
Zhang, Yichao; Bertuletti, Marco; Mazzola, Sergio; Riedel, Samuel; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1040905
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact