CRIS Current Research Information System

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octacore cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154GOp/s (0.65 V, 22nm FD-SOI technology).

Wiese, P., İslamoğlu, G., Scherer, M., Macan, L., Jung, V.J.B., Burrello, A., et al. (2025). Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow. IEEE DESIGN & TEST, Early access, 1-1 [10.1109/mdat.2025.3527371].

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Wiese, Philip;İslamoğlu, Gamze;Scherer, Moritz;Macan, Luka;Jung, Victor J. B.;Burrello, Alessio;Conti, Francesco;Benini, Luca

2025

Abstract

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octacore cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154GOp/s (0.65 V, 22nm FD-SOI technology).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				IEEE DESIGN & TEST
			
	Codice DOI
	
				https://dx.doi.org/10.1109/mdat.2025.3527371
			
	Citazione
	
				Wiese, P., İslamoğlu, G., Scherer, M., Macan, L., Jung, V.J.B., Burrello, A., et al. (2025). Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow. IEEE DESIGN & TEST, Early access, 1-1 [10.1109/mdat.2025.3527371].
			
	Tutti gli autori
	
						Wiese, Philip; İslamoğlu, Gamze; Scherer, Moritz; Macan, Luka; Jung, Victor J. B.; Burrello, Alessio; Conti, Francesco; Benini, Luca...espandi

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1000948

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

social impact