Modern general-purpose accelerators integrate a large number of programmable area- and energy-efficient processing elements (PEs), to deliver high performance while meeting stringent power delivery and thermal dissipation constraints. In this context, PEs are often implemented by scalar in-order cores, which are highly sensitive to pipeline stalls. Traditional software techniques, such as loop unrolling, mitigate the issue at the cost of increased register pressure, limiting flexibility. We propose scalar chaining, a novel hardware-software solution, to address this issue without incurring the drawbacks of traditional software-only techniques. We demonstrate our solution on register-limited stencil codes, achieving >93 % FPU utilizations and a 4% speedup and 10% higher energy efficiency, on average, over highly-optimized baselines. Our implementation is fully open source and performance experiments are reproducible using free software.11https://github.com/colluca/snitch-cluster/tree/chaining

Colagrande, L., Jonnalagadda, J., Benini, L. (2025). Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar Processors. Institute of Electrical and Electronics Engineers Inc. [10.23919/date64628.2025.10992722].

Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar Processors

Benini, Luca
2025

Abstract

Modern general-purpose accelerators integrate a large number of programmable area- and energy-efficient processing elements (PEs), to deliver high performance while meeting stringent power delivery and thermal dissipation constraints. In this context, PEs are often implemented by scalar in-order cores, which are highly sensitive to pipeline stalls. Traditional software techniques, such as loop unrolling, mitigate the issue at the cost of increased register pressure, limiting flexibility. We propose scalar chaining, a novel hardware-software solution, to address this issue without incurring the drawbacks of traditional software-only techniques. We demonstrate our solution on register-limited stencil codes, achieving >93 % FPU utilizations and a 4% speedup and 10% higher energy efficiency, on average, over highly-optimized baselines. Our implementation is fully open source and performance experiments are reproducible using free software.11https://github.com/colluca/snitch-cluster/tree/chaining
2025
Proceedings -Design, Automation and Test in Europe, DATE
1
2
Colagrande, L., Jonnalagadda, J., Benini, L. (2025). Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar Processors. Institute of Electrical and Electronics Engineers Inc. [10.23919/date64628.2025.10992722].
Colagrande, Luca; Jonnalagadda, Jayanth; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1040757
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact