Reduced-precision floating-point (FP) arithmetic is a technology trend to minimize memory usage and execution time on power-constrained devices. This paper explores the potential applications of the 8-bit FP format beyond the classical deep learning use cases. We comprehensively analyze alternative FP8 formats, considering the allocation of mantissa and exponent bits. Additionally, we examine the impact on energy efficiency, accuracy, and execution time of several digital signal processing and classical machine learning kernels using the parallel ultra-low-power (PULP) platform based on the RISC-V instruction set architecture. Our findings show that using appropriate exponent choice and scaling methods results in acceptable errors compared to FP32. Our study facilitates the adoption of FP8 formats outside the deep learning domain to achieve consistent energy efficiency and speed improvements without compromising accuracy. On average, our results indicate speedup of 3.14x, 6.19x, 11.11x, and 18.81x on 1, 2, 4, and 8 cores, respectively. Furthermore, the vectorized implementation of FP8 in the same setup delivers remarkable energy savings of 2.97x, 5.07x, 7.37x, and 15.05x.
Mirsalari, S.A., Yousefzadeh, S., Hemani, A., Tagliavini, G. (2024). Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain. Institute of Electrical and Electronics Engineers Inc. [10.1109/icecs61496.2024.10848785].
Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain
Mirsalari, Seyed Ahmad;Tagliavini, Giuseppe
2024
Abstract
Reduced-precision floating-point (FP) arithmetic is a technology trend to minimize memory usage and execution time on power-constrained devices. This paper explores the potential applications of the 8-bit FP format beyond the classical deep learning use cases. We comprehensively analyze alternative FP8 formats, considering the allocation of mantissa and exponent bits. Additionally, we examine the impact on energy efficiency, accuracy, and execution time of several digital signal processing and classical machine learning kernels using the parallel ultra-low-power (PULP) platform based on the RISC-V instruction set architecture. Our findings show that using appropriate exponent choice and scaling methods results in acceptable errors compared to FP32. Our study facilitates the adoption of FP8 formats outside the deep learning domain to achieve consistent energy efficiency and speed improvements without compromising accuracy. On average, our results indicate speedup of 3.14x, 6.19x, 11.11x, and 18.81x on 1, 2, 4, and 8 cores, respectively. Furthermore, the vectorized implementation of FP8 in the same setup delivers remarkable energy savings of 2.97x, 5.07x, 7.37x, and 15.05x.File | Dimensione | Formato | |
---|---|---|---|
Float8_PULP_ICECS.pdf
accesso aperto
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per accesso libero gratuito
Dimensione
908.35 kB
Formato
Adobe PDF
|
908.35 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.