Random forests (RFs) use a collection of decision trees (DTs) to perform the classification or regression. RFs are adopted in a wide variety of machine learning (ML) applications, and they are finding increasing use also in scenarios at the extreme edge of the Internet of Things (TinyML) where memory constraints are particularly tight. This article addresses the optimization of the computational and storage costs for running DTs on the microcontroller units (MCUs) typically deployed in TinyML scenarios. We introduce three alternative DT kernels optimized for memory- and compute-limited MCUs, providing insight into the key memory-latency tradeoffs on an open-source RISC-V platform. We identify key bottlenecks and demonstrate that SW optimizations enable up to significant memory footprint and latency decrease. Experimental results show that the optimized kernels achieve up to 4.5 µs latency, 4.8× speedup, and 45% storage reduction against the widely-adopted naive DT design. We carry out a detailed performance and energy cost analysis of various optimized DT variants: the best approach requires just 8 instructions and 0.155 pJ per decision.
Tabanelli E., Tagliavini G., Benini L. (2022). Optimizing Random Forest Based Inference on RISC-V MCUs at the Extreme Edge. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 41(11), 4516-4526 [10.1109/TCAD.2022.3199903].
Optimizing Random Forest Based Inference on RISC-V MCUs at the Extreme Edge
Tabanelli E.;Tagliavini G.
;Benini L.
2022
Abstract
Random forests (RFs) use a collection of decision trees (DTs) to perform the classification or regression. RFs are adopted in a wide variety of machine learning (ML) applications, and they are finding increasing use also in scenarios at the extreme edge of the Internet of Things (TinyML) where memory constraints are particularly tight. This article addresses the optimization of the computational and storage costs for running DTs on the microcontroller units (MCUs) typically deployed in TinyML scenarios. We introduce three alternative DT kernels optimized for memory- and compute-limited MCUs, providing insight into the key memory-latency tradeoffs on an open-source RISC-V platform. We identify key bottlenecks and demonstrate that SW optimizations enable up to significant memory footprint and latency decrease. Experimental results show that the optimized kernels achieve up to 4.5 µs latency, 4.8× speedup, and 45% storage reduction against the widely-adopted naive DT design. We carry out a detailed performance and energy cost analysis of various optimized DT variants: the best approach requires just 8 instructions and 0.155 pJ per decision.File | Dimensione | Formato | |
---|---|---|---|
TCAD_RF_postprint.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
3 MB
Formato
Adobe PDF
|
3 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.