FlooNoC: A 645-Gb/s/link 0.15-pJ/B/hop Open-Source NoC With Wide Physical Links and End-to-End AXI4 Parallel Multistream Support

Fischer, Tim; Rogenmoser, Michael; Benz, Thomas; Gürkaynak, Frank K.; Benini, Luca

doi:10.1109/tvlsi.2025.3527225

The new generation of domain-specific AI accelerators is characterized by rapidly increasing demands for bulk data transfers, as opposed to small, latency-critical cache line transfers typical of traditional cache-coherent systems. In this article, we address this critical need by introducing the FlooNoC network-on-chip (NoC), featuring very wide, fully advanced extensible interface (AXI4) compliant links designed to meet the massive bandwidth needs at high energy efficiency. At the transport level, nonblocking transactions are supported for latency tolerance. In addition, a novel end-to-end ordering approach for AXI4, enabled by a multistream capable direct memory access (DMA) engine, simplifies network interfaces (NIs) and eliminates interstream dependencies. Furthermore, dedicated physical links are instantiated for short, latency-critical messages. A complete end-to-end reference implementation in 12-nm FinFET technology demonstrates the physical feasibility and power performance area (PPA) benefits of our approach. Using wide links on high levels of metal, we achieve a bandwidth of 645 Gb/s/link and a total aggregate bandwidth of 103 Tb/s for an 8× 4 mesh of processors' cluster tiles, with a total of 288 RISC-V cores. The NoC imposes a minimal area overhead of only 3.5% per compute tile and achieves a leading-edge energy efficiency of 0.15 pJ/B/hop at 0.8 V. Compared with state-of-the-art (SoA) NoCs, our system offers three times the energy efficiency and more than double the link bandwidth. Furthermore, compared with a traditional AXI4-based multilayer interconnect, our NoC achieves a 30% reduction in area, corresponding to a 47% increase in GFLOPSDP within the same floorplan.

Fischer, T., Rogenmoser, M., Benz, T., Gürkaynak, F.K., Benini, L. (2025). FlooNoC: A 645-Gb/s/link 0.15-pJ/B/hop Open-Source NoC With Wide Physical Links and End-to-End AXI4 Parallel Multistream Support. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 33(4), 1094-1107 [10.1109/tvlsi.2025.3527225].

FlooNoC: A 645-Gb/s/link 0.15-pJ/B/hop Open-Source NoC With Wide Physical Links and End-to-End AXI4 Parallel Multistream Support

Fischer, Tim;Rogenmoser, Michael;Benz, Thomas;Gürkaynak, Frank K.;Benini, Luca

2025

Abstract

The new generation of domain-specific AI accelerators is characterized by rapidly increasing demands for bulk data transfers, as opposed to small, latency-critical cache line transfers typical of traditional cache-coherent systems. In this article, we address this critical need by introducing the FlooNoC network-on-chip (NoC), featuring very wide, fully advanced extensible interface (AXI4) compliant links designed to meet the massive bandwidth needs at high energy efficiency. At the transport level, nonblocking transactions are supported for latency tolerance. In addition, a novel end-to-end ordering approach for AXI4, enabled by a multistream capable direct memory access (DMA) engine, simplifies network interfaces (NIs) and eliminates interstream dependencies. Furthermore, dedicated physical links are instantiated for short, latency-critical messages. A complete end-to-end reference implementation in 12-nm FinFET technology demonstrates the physical feasibility and power performance area (PPA) benefits of our approach. Using wide links on high levels of metal, we achieve a bandwidth of 645 Gb/s/link and a total aggregate bandwidth of 103 Tb/s for an 8× 4 mesh of processors' cluster tiles, with a total of 288 RISC-V cores. The NoC imposes a minimal area overhead of only 3.5% per compute tile and achieves a leading-edge energy efficiency of 0.15 pJ/B/hop at 0.8 V. Compared with state-of-the-art (SoA) NoCs, our system offers three times the energy efficiency and more than double the link bandwidth. Furthermore, compared with a traditional AXI4-based multilayer interconnect, our NoC achieves a 30% reduction in area, corresponding to a 47% increase in GFLOPSDP within the same floorplan.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
			
	Codice DOI
	
				https://dx.doi.org/10.1109/tvlsi.2025.3527225
			
	Citazione
	
				Fischer, T., Rogenmoser, M., Benz, T., Gürkaynak, F.K., Benini, L. (2025). FlooNoC: A 645-Gb/s/link 0.15-pJ/B/hop Open-Source NoC With Wide Physical Links and End-to-End AXI4 Parallel Multistream Support. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 33(4), 1094-1107 [10.1109/tvlsi.2025.3527225].
			
	Tutti gli autori
	
						Fischer, Tim; Rogenmoser, Michael; Benz, Thomas; Gürkaynak, Frank K.; Benini, Luca

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1039416

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

15

10

ND

CRIS Current Research Information System