CRIS Current Research Information System

We present the WaveFormer, a neural network architecture based on a linear attention transformer to enable long sequence inference for TinyML devices. Waveformer achieves a new state-of-the-art accuracy of 98.8 % and 99.1 % on the Google Speech V2 keyword spotting (KWS) dataset for the 12 and 35 class problems with only 130 kB of weight storage, compatible with MCU class devices. Top-1 accuracy is improved by 0.1 and 0.9 percentage points while reducing the model size and number of operations by 2.5× and 4.7× compared to the state of the art. We also propose a hardware-friendly 8-bit integer quantization algorithm for the linear attention operator, enabling efficient deployment on low-cost, ultra-low-power microcontrollers without loss of accuracy.

Scherer, M., Cioflan, C., Magno, M., Benini, L. (2024). Work in Progress: Linear Transformers for TinyML. Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE58400.2024.10546828].

Work in Progress: Linear Transformers for TinyML

Scherer M.;Cioflan C.;Magno M.;Benini L.

2024

Abstract

We present the WaveFormer, a neural network architecture based on a linear attention transformer to enable long sequence inference for TinyML devices. Waveformer achieves a new state-of-the-art accuracy of 98.8 % and 99.1 % on the Google Speech V2 keyword spotting (KWS) dataset for the 12 and 35 class problems with only 130 kB of weight storage, compatible with MCU class devices. Top-1 accuracy is improved by 0.1 and 0.9 percentage points while reducing the model size and number of operations by 2.5× and 4.7× compared to the state of the art. We also propose a hardware-friendly 8-bit integer quantization algorithm for the linear attention operator, enabling efficient deployment on low-cost, ultra-low-power microcontrollers without loss of accuracy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo del volume
	
				Proceedings -Design, Automation and Test in Europe, DATE
			
	Pagina iniziale
	
				.
			
	Pagina finale
	
				.
			
	Collana/Serie
	
				PROCEEDINGS - DESIGN, AUTOMATION, AND TEST IN EUROPE CONFERENCE AND EXHIBITION
			
	Codice DOI
	
				https://dx.doi.org/10.23919/DATE58400.2024.10546828
			
	Citazione
	
				Scherer, M., Cioflan, C., Magno, M., Benini, L. (2024). Work in Progress: Linear Transformers for TinyML. Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE58400.2024.10546828].
			
	Tutti gli autori
	
						Scherer, M.; Cioflan, C.; Magno, M.; Benini, L.

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1004734

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact