Achieving Simultaneous Localization and Mapping (SLAM) in an unfamiliar environment is a crucial challenge, especially for robots that rely on efficient on-device processing. While accurate mapping is achievable on high-end robotic systems, it still faces substantial challenges due to hardware and latency constraints, especially on smaller robots with limited power budget. Although machine learning is proving highly effective for robot perception, there is a growing need for lightweight solutions in terms of computation and sensing. This paper presents Nano VS, a lightweight monocular perception layer supporting semantic mapping with less than 1 M parameters. We propose a family of quantized and efficient models integrating emerging attention layers and weight-sharing in a multi-task neural network. Experimental results demonstrate multiple tasks within a single model, including Semantic Segmentation (SS), Feature Detection and Description (FDD), and Visual Place Recognition (VPR). Our findings indicate that multi-tasking effectively reduces computational overhead by eliminating the need for multiple networks. Nano VS achieves 70% classwise mIoU with the cityscapes benchmark and 66% Recall@1 in the Pitts30k challenge on tiny images (120x160 pixels). Finally, this paper implements and evaluates Nano VS on a novel milliwatt multi-core RISC-V Microcontroller (MCU), running the full semantic front-end in as little as 52 ms, consuming only 9 mJ per inference. This work represents a significant step towards making advanced SLAM capabilities accessible to tiny robots, or even faster and energy-efficient SLAM on high-end processors.
Rüegg, T., Giordano, M., Polonelli, T., Benini, L., Magno, M. (2025). Nano VS: a Neural Perception Layer for Fully Onboard Visual Semantic Mapping on Tiny Robots. Institute of Electrical and Electronics Engineers Inc. [10.1109/ijcnn64981.2025.11228606].
Nano VS: a Neural Perception Layer for Fully Onboard Visual Semantic Mapping on Tiny Robots
Polonelli, Tommaso;Benini, Luca;Magno, Michele
2025
Abstract
Achieving Simultaneous Localization and Mapping (SLAM) in an unfamiliar environment is a crucial challenge, especially for robots that rely on efficient on-device processing. While accurate mapping is achievable on high-end robotic systems, it still faces substantial challenges due to hardware and latency constraints, especially on smaller robots with limited power budget. Although machine learning is proving highly effective for robot perception, there is a growing need for lightweight solutions in terms of computation and sensing. This paper presents Nano VS, a lightweight monocular perception layer supporting semantic mapping with less than 1 M parameters. We propose a family of quantized and efficient models integrating emerging attention layers and weight-sharing in a multi-task neural network. Experimental results demonstrate multiple tasks within a single model, including Semantic Segmentation (SS), Feature Detection and Description (FDD), and Visual Place Recognition (VPR). Our findings indicate that multi-tasking effectively reduces computational overhead by eliminating the need for multiple networks. Nano VS achieves 70% classwise mIoU with the cityscapes benchmark and 66% Recall@1 in the Pitts30k challenge on tiny images (120x160 pixels). Finally, this paper implements and evaluates Nano VS on a novel milliwatt multi-core RISC-V Microcontroller (MCU), running the full semantic front-end in as little as 52 ms, consuming only 9 mJ per inference. This work represents a significant step towards making advanced SLAM capabilities accessible to tiny robots, or even faster and energy-efficient SLAM on high-end processors.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



