TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology

Scherer, Moritz; Mauro, Alfio Di; Fischer, Tim; Rutishauser, Georg; Benini, Luca

doi:10.1109/MM.2022.3226630

Tiny machine learning (TinyML) applications impose µJ/inference constraints, with a maximum power consumption of tens of megawatt. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital ternary neural network (TNN) accelerator in a reduced instruction set computer-five (RISC-V)-based System-on-Chip (SoC). Besides supporting ternary convolutional neural networks, we introduce extensions to the accelerator design that enable the processing of time-dilated temporal convolutional neural networks (TCNs). The design achieves 5.5-µJ/inference, 12.2 mW, 8,000 inferences/s at 0.5 V for a dynamic vision sensor (DVS)-based TCN and an accuracy of 94.5%, and 2.72-µJ/inference, 12.2 mW, 3,200 inferences/s at 0.5 V for a nontrivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86%. The peak energy efficiency is 1,036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67× while achieving competitive accuracy.

Scherer, M., Mauro, A.D., Fischer, T., Rutishauser, G., Benini, L. (2023). TCN-CUTIE: A 1,036-TOp/s/W, 2.72-µJ/Inference, 12.2-mW All-Digital Ternary Accelerator in 22-nm FDX Technology. IEEE MICRO, 43(1), 42-48 [10.1109/MM.2022.3226630].