Next-generation wireless technologies (for immersivemassive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to hundreds of cores into shared-memory clusters, which are then scaled out as multi-cluster manycore systems. This hierarchical design, used in GPUs and accelerators, requires a balancing act between fewer large clusters and more smaller clusters, affecting design complexity, synchronization, communication efficiency, and programmability. While all multi-cluster architectures must balance these trade-offs, there is limited insight into optimal cluster sizes. This paper analyzes various cluster configurations, focusing on synchronization, data movement overhead, and programmability for typical wireless sensing and communication workloads. We extend the open-source shared-memory cluster MemPool into a multi-cluster architecture and propose a novel double-buffering barrier that decouples processor and DMA. Our results show a single 256-core cluster can be twice as fast as 16 16-core clusters for memory-bound kernels and up to 24% faster for compute-bound kernels due to reduced synchronization and communication overheads.

Riedel, S., Zhang, Y., Bertuletti, M., Benini, L. (2025). Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/iwasi66786.2025.11122009].

Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication

Benini, Luca
2025

Abstract

Next-generation wireless technologies (for immersivemassive communication, joint communication and sensing) demand highly parallel architectures for massive data processing. A common architectural template scales up by grouping tens to hundreds of cores into shared-memory clusters, which are then scaled out as multi-cluster manycore systems. This hierarchical design, used in GPUs and accelerators, requires a balancing act between fewer large clusters and more smaller clusters, affecting design complexity, synchronization, communication efficiency, and programmability. While all multi-cluster architectures must balance these trade-offs, there is limited insight into optimal cluster sizes. This paper analyzes various cluster configurations, focusing on synchronization, data movement overhead, and programmability for typical wireless sensing and communication workloads. We extend the open-source shared-memory cluster MemPool into a multi-cluster architecture and propose a novel double-buffering barrier that decouples processor and DMA. Our results show a single 256-core cluster can be twice as fast as 16 16-core clusters for memory-bound kernels and up to 24% faster for compute-bound kernels due to reduced synchronization and communication overheads.
2025
2025 10th International Workshop on Advances in Sensors and Interfaces (IWASI)
1
6
Riedel, S., Zhang, Y., Bertuletti, M., Benini, L. (2025). Optimizing Scalable Multi-Cluster Architectures for Next-Generation Wireless Sensing and Communication. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/iwasi66786.2025.11122009].
Riedel, Samuel; Zhang, Yichao; Bertuletti, Marco; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1040852
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact