Real-time streaming applications play a pivotal role across diverse domains, including autonomous systems, speech processing, and bio-signal monitoring. Temporal Convolutional Networks (TCNs) effectively model sequences by capturing longterm dependencies, but real-time inference on ultra-low-power microcontrollers (MCUs) remains challenging due to high computational and memory requirements. This work presents a framework to optimize TCN inference for real-time streaming applications by introducing a multi-timestep approach combined with advanced quantization techniques. This solution enables a dynamic adaptation of the streaming application by finding a trade-off between latency and computational efficiency. Deploying a speech enhancement model (Conv-TasNet) on the GAP9 ultra-low-power MCU, we achieve a 2 ms inference time (33% of the real-time constraint of 6.25 ms), along with a 108.9 × reduction in MAC operations and a 27.7 × cycle reduction. Using four timesteps increases the MAC/Cycle ratio to 3.3 while maintaining a 4.3 ms inference time, less than 18% of the extended realtime budget (25 ms). Combining INT8-BFP16 mixed precision quantization and multi-timestep processing delivers a 4 × memory saving at the same performance.

Mirsalari, S.A., Fariselli, M., Bijar, L., Paci, F., Benini, L., Tagliavini, G. (2025). Enabling Real-Time Streaming Temporal Convolution Network Inference on Ultra-Low-Power Microcontrollers. New York (USA) : IEEE Computer Society [10.1109/isvlsi65124.2025.11130291].

Enabling Real-Time Streaming Temporal Convolution Network Inference on Ultra-Low-Power Microcontrollers

Mirsalari, Seyed Ahmad;Benini, Luca;Tagliavini, Giuseppe
2025

Abstract

Real-time streaming applications play a pivotal role across diverse domains, including autonomous systems, speech processing, and bio-signal monitoring. Temporal Convolutional Networks (TCNs) effectively model sequences by capturing longterm dependencies, but real-time inference on ultra-low-power microcontrollers (MCUs) remains challenging due to high computational and memory requirements. This work presents a framework to optimize TCN inference for real-time streaming applications by introducing a multi-timestep approach combined with advanced quantization techniques. This solution enables a dynamic adaptation of the streaming application by finding a trade-off between latency and computational efficiency. Deploying a speech enhancement model (Conv-TasNet) on the GAP9 ultra-low-power MCU, we achieve a 2 ms inference time (33% of the real-time constraint of 6.25 ms), along with a 108.9 × reduction in MAC operations and a 27.7 × cycle reduction. Using four timesteps increases the MAC/Cycle ratio to 3.3 while maintaining a 4.3 ms inference time, less than 18% of the extended realtime budget (25 ms). Combining INT8-BFP16 mixed precision quantization and multi-timestep processing delivers a 4 × memory saving at the same performance.
2025
Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
1
6
Mirsalari, S.A., Fariselli, M., Bijar, L., Paci, F., Benini, L., Tagliavini, G. (2025). Enabling Real-Time Streaming Temporal Convolution Network Inference on Ultra-Low-Power Microcontrollers. New York (USA) : IEEE Computer Society [10.1109/isvlsi65124.2025.11130291].
Mirsalari, Seyed Ahmad; Fariselli, Marco; Bijar, Léo; Paci, Francesco; Benini, Luca; Tagliavini, Giuseppe
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1033172
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact