CRIS Current Research Information System

To keep up with the growing computational requirements of machine learning workloads, many-core accelerators integrate an ever-increasing number of processing elements, putting the eﬃciency of memory and interconnect subsystems to the test. In this work, we present the design of a multicast-capable AXI crossbar, with the goal of enhancing data movement eﬃciency in massively parallel ma-chine learning accelerators. We propose a lightweight, yet flexible, multicast implementation, with a modest area and timing overhead (12 % and 6 % respectively) even on the largest physically-implementable 16-to-16 AXI crossbar. To demonstrate the flexibility and end-to-end benefits of our design, we integrate our extension into an open-source 288-core accelerator. We report tangible performance improvements on a key computational kernel for machine learning workloads, matrix multiplication, measuring a 29 % speedup on our reference system.

Colagrande, L., Benini, L. (2025). A Multicast-Capable AXI Crossbar for Many-core Machine Learning Accelerators. Institute of Electrical and Electronics Engineers Inc. [10.1109/aicas64808.2025.11173099].

A Multicast-Capable AXI Crossbar for Many-core Machine Learning Accelerators

Colagrande, Luca;Benini, Luca

2025

Abstract

To keep up with the growing computational requirements of machine learning workloads, many-core accelerators integrate an ever-increasing number of processing elements, putting the eﬃciency of memory and interconnect subsystems to the test. In this work, we present the design of a multicast-capable AXI crossbar, with the goal of enhancing data movement eﬃciency in massively parallel ma-chine learning accelerators. We propose a lightweight, yet flexible, multicast implementation, with a modest area and timing overhead (12 % and 6 % respectively) even on the largest physically-implementable 16-to-16 AXI crossbar. To demonstrate the flexibility and end-to-end benefits of our design, we integrate our extension into an open-source 288-core accelerator. We report tangible performance improvements on a key computational kernel for machine learning workloads, matrix multiplication, measuring a 29 % speedup on our reference system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				AICAS 2025 - 2025 7th IEEE International Conference on Artificial Intelligence Circuits and Systems, Proceedings
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				5
			
	Codice DOI
	
				https://dx.doi.org/10.1109/aicas64808.2025.11173099
			
	Citazione
	
				Colagrande, L., Benini, L. (2025). A Multicast-Capable AXI Crossbar for Many-core Machine Learning Accelerators. Institute of Electrical and Electronics Engineers Inc. [10.1109/aicas64808.2025.11173099].
			
	Tutti gli autori
	
						Colagrande, Luca; Benini, Luca

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1039991

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

ND

social impact