Temporal Convolutional Networks (TCNs) are an effective tool for time-series analysis but face challenges in real-time deployment on resource-constrained devices. To address this, we introduce a Python library that automates the conversion of TCN models to a streaming format without affecting model accuracy, significantly reducing deployment time and computational resources. Our tool enables unified multi-timestep streaming, allowing a balance between network versatility and computational demands. We demonstrate that our stream-oriented approach reduces Multiply-Accumulate Operations (MACs) and execution cycles by up to 901 × and 94.5 ×, respectively, across various Conv-TasNet configurations w.r.t. non-streaming models. Real-world hardware tests on the GAP9 low-power multi-core MCU confirm that configurations maintaining a receptive field of approximately 1.2 seconds meet GAP9 memory constraints (128 kB + 1.5 MB) while reducing the computational intensity up to 33× and 313× in cycles and MACs respectively
Mirsalari, S.A., Bijar, L., Fariselli, M., Croome, M., Paci, F., Tagliavini, G., et al. (2024). StreamEase: Enabling Real-Time Inference of Temporal Convolution Networks on Low-Power MCUs with Stream-Oriented Automatic Transformation [10.1109/icecs61496.2024.10848742].
StreamEase: Enabling Real-Time Inference of Temporal Convolution Networks on Low-Power MCUs with Stream-Oriented Automatic Transformation
Mirsalari, Seyed Ahmad;Tagliavini, Giuseppe;Benini, Luca
2024
Abstract
Temporal Convolutional Networks (TCNs) are an effective tool for time-series analysis but face challenges in real-time deployment on resource-constrained devices. To address this, we introduce a Python library that automates the conversion of TCN models to a streaming format without affecting model accuracy, significantly reducing deployment time and computational resources. Our tool enables unified multi-timestep streaming, allowing a balance between network versatility and computational demands. We demonstrate that our stream-oriented approach reduces Multiply-Accumulate Operations (MACs) and execution cycles by up to 901 × and 94.5 ×, respectively, across various Conv-TasNet configurations w.r.t. non-streaming models. Real-world hardware tests on the GAP9 low-power multi-core MCU confirm that configurations maintaining a receptive field of approximately 1.2 seconds meet GAP9 memory constraints (128 kB + 1.5 MB) while reducing the computational intensity up to 33× and 313× in cycles and MACs respectivelyI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.