Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but staggering performance requirements demand a high number of PEs coupled with extreme Power, Performance and Area (PPA) efficiency. We present the architecture, design, and full physical implementation of Terapool-SDR, a cluster for Software Defined Radio (SDR) with 1024 latency-tolerant, compact RV32 PEs, sharing a global view of a 4 MiB, 4096-banked, L1 memory. We report various feasible configurations of TeraPool-SDR featuring an ultra-high bandwidth PE-to-L1-memory interconnect, clocked at 730 MHz, 880 MHz, and 924 MHz (TT/0.80 V/25 °C) in 12 nm FinFET technology. The TeraPool-SDR cluster achieves high energy efficiency on all SDR key kernels for 5G RANs: Fast Fourier Transform (93 GOPS/W), Matrix-Multiplication (125 GOPS/W), Channel Estimation (96 GOPS/W), and Linear System Inversion (61 GOPS/W). For all the kernels, it consumes less than 10 W, in compliance with industry standards.
Zhang, Y., Bertuletti, M., Riedel, S., Cavalcante, M., Vanelli-Coralli, A., Benini, L. (2024). TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES : Association for Computing Machinery [10.1145/3649476.3658735].
TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios
Vanelli-Coralli, Alessandro;Benini, Luca
2024
Abstract
Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but staggering performance requirements demand a high number of PEs coupled with extreme Power, Performance and Area (PPA) efficiency. We present the architecture, design, and full physical implementation of Terapool-SDR, a cluster for Software Defined Radio (SDR) with 1024 latency-tolerant, compact RV32 PEs, sharing a global view of a 4 MiB, 4096-banked, L1 memory. We report various feasible configurations of TeraPool-SDR featuring an ultra-high bandwidth PE-to-L1-memory interconnect, clocked at 730 MHz, 880 MHz, and 924 MHz (TT/0.80 V/25 °C) in 12 nm FinFET technology. The TeraPool-SDR cluster achieves high energy efficiency on all SDR key kernels for 5G RANs: Fast Fourier Transform (93 GOPS/W), Matrix-Multiplication (125 GOPS/W), Channel Estimation (96 GOPS/W), and Linear System Inversion (61 GOPS/W). For all the kernels, it consumes less than 10 W, in compliance with industry standards.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.