This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices using low-power short-range RADAR sensors. A 2-D convolutional neural network (CNN) using range-frequency Doppler features is combined with a temporal convolutional neural network (TCN) for time sequence prediction. The final algorithm has a model size of only 46 thousand parameters, yielding a memory footprint of only 92 KB. Two data sets containing 11 challenging hand gestures performed by 26 different people have been recorded containing a total of 20'210 gesture instances. On the 11 hand gesture data set, accuracies of 86.6% (26 users) and 92.4% (single user) have been achieved, which are comparable to the state of the art, which achieves 87% (10 users) and 94% (single user), while using a TCN-based network that is $7500 imes $ smaller than the state of the art. Furthermore, the gesture recognition classifier has been implemented on a parallel ultralow power processor, demonstrating that real-time prediction is feasible with only 21 mW of power consumption for the full TCN sequence prediction network, while a system-level power consumption of less than 120 mW is achieved. We provide open-source access to example code and all data collected and used in this work on tinyradar.ethz.ch.
Scherer M., Magno M., Erb J., Mayer P., Eggimann M., Benini L. (2021). TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars. IEEE INTERNET OF THINGS JOURNAL, 8(13), 10336-10346 [10.1109/JIOT.2021.3067382].
TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars
Benini L.
2021
Abstract
This work proposes a low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices using low-power short-range RADAR sensors. A 2-D convolutional neural network (CNN) using range-frequency Doppler features is combined with a temporal convolutional neural network (TCN) for time sequence prediction. The final algorithm has a model size of only 46 thousand parameters, yielding a memory footprint of only 92 KB. Two data sets containing 11 challenging hand gestures performed by 26 different people have been recorded containing a total of 20'210 gesture instances. On the 11 hand gesture data set, accuracies of 86.6% (26 users) and 92.4% (single user) have been achieved, which are comparable to the state of the art, which achieves 87% (10 users) and 94% (single user), while using a TCN-based network that is $7500 imes $ smaller than the state of the art. Furthermore, the gesture recognition classifier has been implemented on a parallel ultralow power processor, demonstrating that real-time prediction is feasible with only 21 mW of power consumption for the full TCN sequence prediction network, while a system-level power consumption of less than 120 mW is achieved. We provide open-source access to example code and all data collected and used in this work on tinyradar.ethz.ch.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.