Tiny machine learning (TinyML) applications impose µJ/inference constraints, with a maximum power consumption of tens of megawatt. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital ternary neural network (TNN) accelerator in a reduced instruction set computer-five (RISC-V)-based System-on-Chip (SoC). Besides supporting ternary convolutional neural networks, we introduce extensions to the accelerator design that enable the processing of time-dilated temporal convolutional neural networks (TCNs). The design achieves 5.5-µJ/inference, 12.2 mW, 8,000 inferences/s at 0.5 V for a dynamic vision sensor (DVS)-based TCN and an accuracy of 94.5%, and 2.72-µJ/inference, 12.2 mW, 3,200 inferences/s at 0.5 V for a nontrivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86%. The peak energy efficiency is 1,036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67× while achieving competitive accuracy.

Scherer, M., Mauro, A.D., Fischer, T., Rutishauser, G., Benini, L. (2023). TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology. IEEE MICRO, 43(1), 42-48 [10.1109/MM.2022.3226630].

TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology

Benini, Luca
2023

Abstract

Tiny machine learning (TinyML) applications impose µJ/inference constraints, with a maximum power consumption of tens of megawatt. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital ternary neural network (TNN) accelerator in a reduced instruction set computer-five (RISC-V)-based System-on-Chip (SoC). Besides supporting ternary convolutional neural networks, we introduce extensions to the accelerator design that enable the processing of time-dilated temporal convolutional neural networks (TCNs). The design achieves 5.5-µJ/inference, 12.2 mW, 8,000 inferences/s at 0.5 V for a dynamic vision sensor (DVS)-based TCN and an accuracy of 94.5%, and 2.72-µJ/inference, 12.2 mW, 3,200 inferences/s at 0.5 V for a nontrivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86%. The peak energy efficiency is 1,036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67× while achieving competitive accuracy.
2023
Scherer, M., Mauro, A.D., Fischer, T., Rutishauser, G., Benini, L. (2023). TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology. IEEE MICRO, 43(1), 42-48 [10.1109/MM.2022.3226630].
Scherer, Moritz; Mauro, Alfio Di; Fischer, Tim; Rutishauser, Georg; Benini, Luca
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/956441
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact