Real-time streaming applications play a pivotal role across diverse domains, including autonomous systems, speech processing, and bio-signal monitoring. Temporal Convolutional Networks (TCNs) effectively model sequences by capturing longterm dependencies, but real-time inference on ultra-low-power microcontrollers (MCUs) remains challenging due to high computational and memory requirements. This work presents a framework to optimize TCN inference for real-time streaming applications by introducing a multi-timestep approach combined with advanced quantization techniques. This solution enables a dynamic adaptation of the streaming application by finding a trade-off between latency and computational efficiency. Deploying a speech enhancement model (Conv-TasNet) on the GAP9 ultra-low-power MCU, we achieve a 2 ms inference time (33% of the real-time constraint of 6.25 ms), along with a 108.9 × reduction in MAC operations and a 27.7 × cycle reduction. Using four timesteps increases the MAC/Cycle ratio to 3.3 while maintaining a 4.3 ms inference time, less than 18% of the extended realtime budget (25 ms). Combining INT8-BFP16 mixed precision quantization and multi-timestep processing delivers a 4 × memory saving at the same performance.
Mirsalari, S.A., Fariselli, M., Bijar, L., Paci, F., Benini, L., Tagliavini, G. (2025). Enabling Real-Time Streaming Temporal Convolution Network Inference on Ultra-Low-Power Microcontrollers. New York (USA) : IEEE Computer Society [10.1109/isvlsi65124.2025.11130291].
Enabling Real-Time Streaming Temporal Convolution Network Inference on Ultra-Low-Power Microcontrollers
Mirsalari, Seyed Ahmad;Benini, Luca;Tagliavini, Giuseppe
2025
Abstract
Real-time streaming applications play a pivotal role across diverse domains, including autonomous systems, speech processing, and bio-signal monitoring. Temporal Convolutional Networks (TCNs) effectively model sequences by capturing longterm dependencies, but real-time inference on ultra-low-power microcontrollers (MCUs) remains challenging due to high computational and memory requirements. This work presents a framework to optimize TCN inference for real-time streaming applications by introducing a multi-timestep approach combined with advanced quantization techniques. This solution enables a dynamic adaptation of the streaming application by finding a trade-off between latency and computational efficiency. Deploying a speech enhancement model (Conv-TasNet) on the GAP9 ultra-low-power MCU, we achieve a 2 ms inference time (33% of the real-time constraint of 6.25 ms), along with a 108.9 × reduction in MAC operations and a 27.7 × cycle reduction. Using four timesteps increases the MAC/Cycle ratio to 3.3 while maintaining a 4.3 ms inference time, less than 18% of the extended realtime budget (25 ms). Combining INT8-BFP16 mixed precision quantization and multi-timestep processing delivers a 4 × memory saving at the same performance.| File | Dimensione | Formato | |
|---|---|---|---|
|
_ISVLSI_2025__TCN_Denoiser_accepted (002).pdf
embargo fino al 26/08/2027
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per accesso libero gratuito
Dimensione
847.69 kB
Formato
Adobe PDF
|
847.69 kB | Adobe PDF | Visualizza/Apri Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


