QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. This broad spectrum of applications has drawn academic and commercial attention to developing many software libraries and domain-specific hardware solutions. In the Internet of Things (IoT) domain, multicore Parallel Ultra-Low-Power (PULP) architectures are emerging as energy-efficient alternatives, outperforming conventional single-core devices by coupling parallel processing with near-threshold computing. To the best of the authors' knowledge, our study introduces the first parallelized and optimized implementation of three distinct QR decomposition methods (Givens rotations, Gram-Schmidt process, and Householder transformation) on GAP-9, a commercial embodiment of the PULP architecture. Parallel execution on the 8-core cluster leads to a reduction in the total number of cycles by 241% for Givens rotations, 470% for Gram-Schmidt, and 567% for Householder, compared to the GAP9 1-core scenario. while each of them only consumes 0.013 mJ, 0.012 mJ, and 0.216 mJ, respectively. Compared to traditional single-core architectures based on ARM architectures, we achieve 8×, 24×, and 30× better performance and 36×, 35×, and 30× better energy efficiency, paving the way for broad adoption of complex linear algebra tasks in the IoT domain.

Kiamarzi, A., Rossi, D., Tagliavini, G. (2024). QR-PULP: Streamlining QR Decomposition for RISC-V Parallel Ultra-Low-Power Platforms. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES : Association for Computing Machinery, Inc [10.1145/3649153.3649210].

QR-PULP: Streamlining QR Decomposition for RISC-V Parallel Ultra-Low-Power Platforms

Kiamarzi, Amirhossein;Rossi, Davide;Tagliavini, Giuseppe
2024

Abstract

QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. This broad spectrum of applications has drawn academic and commercial attention to developing many software libraries and domain-specific hardware solutions. In the Internet of Things (IoT) domain, multicore Parallel Ultra-Low-Power (PULP) architectures are emerging as energy-efficient alternatives, outperforming conventional single-core devices by coupling parallel processing with near-threshold computing. To the best of the authors' knowledge, our study introduces the first parallelized and optimized implementation of three distinct QR decomposition methods (Givens rotations, Gram-Schmidt process, and Householder transformation) on GAP-9, a commercial embodiment of the PULP architecture. Parallel execution on the 8-core cluster leads to a reduction in the total number of cycles by 241% for Givens rotations, 470% for Gram-Schmidt, and 567% for Householder, compared to the GAP9 1-core scenario. while each of them only consumes 0.013 mJ, 0.012 mJ, and 0.216 mJ, respectively. Compared to traditional single-core architectures based on ARM architectures, we achieve 8×, 24×, and 30× better performance and 36×, 35×, and 30× better energy efficiency, paving the way for broad adoption of complex linear algebra tasks in the IoT domain.
2024
Proceedings of the 21st ACM International Conference on Computing Frontiers, CF 2024
147
154
Kiamarzi, A., Rossi, D., Tagliavini, G. (2024). QR-PULP: Streamlining QR Decomposition for RISC-V Parallel Ultra-Low-Power Platforms. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES : Association for Computing Machinery, Inc [10.1145/3649153.3649210].
Kiamarzi, Amirhossein; Rossi, Davide; Tagliavini, Giuseppe
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1009197
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact