Reduced-precision floating-point (FP) arithmetic is a technology trend to minimize memory usage and execution time on power-constrained devices. This paper explores the potential applications of the 8-bit FP format beyond the classical deep learning use cases. We comprehensively analyze alternative FP8 formats, considering the allocation of mantissa and exponent bits. Additionally, we examine the impact on energy efficiency, accuracy, and execution time of several digital signal processing and classical machine learning kernels using the parallel ultra-low-power (PULP) platform based on the RISC-V instruction set architecture. Our findings show that using appropriate exponent choice and scaling methods results in acceptable errors compared to FP32. Our study facilitates the adoption of FP8 formats outside the deep learning domain to achieve consistent energy efficiency and speed improvements without compromising accuracy. On average, our results indicate speedup of 3.14x, 6.19x, 11.11x, and 18.81x on 1, 2, 4, and 8 cores, respectively. Furthermore, the vectorized implementation of FP8 in the same setup delivers remarkable energy savings of 2.97x, 5.07x, 7.37x, and 15.05x.

Mirsalari, S.A., Yousefzadeh, S., Hemani, A., Tagliavini, G. (2024). Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain. Institute of Electrical and Electronics Engineers Inc. [10.1109/icecs61496.2024.10848785].

Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain

Mirsalari, Seyed Ahmad;Tagliavini, Giuseppe
2024

Abstract

Reduced-precision floating-point (FP) arithmetic is a technology trend to minimize memory usage and execution time on power-constrained devices. This paper explores the potential applications of the 8-bit FP format beyond the classical deep learning use cases. We comprehensively analyze alternative FP8 formats, considering the allocation of mantissa and exponent bits. Additionally, we examine the impact on energy efficiency, accuracy, and execution time of several digital signal processing and classical machine learning kernels using the parallel ultra-low-power (PULP) platform based on the RISC-V instruction set architecture. Our findings show that using appropriate exponent choice and scaling methods results in acceptable errors compared to FP32. Our study facilitates the adoption of FP8 formats outside the deep learning domain to achieve consistent energy efficiency and speed improvements without compromising accuracy. On average, our results indicate speedup of 3.14x, 6.19x, 11.11x, and 18.81x on 1, 2, 4, and 8 cores, respectively. Furthermore, the vectorized implementation of FP8 in the same setup delivers remarkable energy savings of 2.97x, 5.07x, 7.37x, and 15.05x.
2024
Proceedings of the IEEE International Conference on Electronics, Circuits, and Systems
1
4
Mirsalari, S.A., Yousefzadeh, S., Hemani, A., Tagliavini, G. (2024). Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain. Institute of Electrical and Electronics Engineers Inc. [10.1109/icecs61496.2024.10848785].
Mirsalari, Seyed Ahmad; Yousefzadeh, Saba; Hemani, Ahmed; Tagliavini, Giuseppe
File in questo prodotto:
File Dimensione Formato  
Float8_PULP_ICECS.pdf

accesso aperto

Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza: Licenza per accesso libero gratuito
Dimensione 908.35 kB
Formato Adobe PDF
908.35 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1009192
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact