Depth estimation is crucial in several computer vision applications, and a recent trend in this field aims at inferring such a cue from a single camera. Unfortunately, despite the compelling results achieved, state-of-the-art monocular depth estimation methods are computationally demanding, thus precluding their practical deployment in several application contexts characterized by low-power constraints. Therefore, in this paper, we propose a lightweight Convolutional Neural Network based on a shallow pyramidal architecture, referred to as PyDNet, enabling monocular depth estimation on microcontrollers. The network is trained in a peculiar self-supervised manner leveraging proxy labels obtained through a traditional stereo algorithm. Moreover, we propose optimization strategies aimed at performing computations with quantized 8-bit data and map the high-level description of the network to low-level layers optimized for the target microcontroller architecture. Exhaustive experimental results on standard datasets and an in-depth evaluation with a device belonging to the popular Arm Cortex- M family confirm that obtaining sufficiently accurate monocular depth estimation on microcontrollers is feasible. To the best of our knowledge, our proposal is the first one enabling such remarkable achievement, paving the way for the deployment of monocular depth cues onto the tiny end-nodes of distributed sensor networks.

Monocular Depth Perception on Microcontrollers for Edge Applications

Poggi, Matteo;Tosi, Fabio;Aleotti, Filippo;Mattoccia, Stefano
2022

Abstract

Depth estimation is crucial in several computer vision applications, and a recent trend in this field aims at inferring such a cue from a single camera. Unfortunately, despite the compelling results achieved, state-of-the-art monocular depth estimation methods are computationally demanding, thus precluding their practical deployment in several application contexts characterized by low-power constraints. Therefore, in this paper, we propose a lightweight Convolutional Neural Network based on a shallow pyramidal architecture, referred to as PyDNet, enabling monocular depth estimation on microcontrollers. The network is trained in a peculiar self-supervised manner leveraging proxy labels obtained through a traditional stereo algorithm. Moreover, we propose optimization strategies aimed at performing computations with quantized 8-bit data and map the high-level description of the network to low-level layers optimized for the target microcontroller architecture. Exhaustive experimental results on standard datasets and an in-depth evaluation with a device belonging to the popular Arm Cortex- M family confirm that obtaining sufficiently accurate monocular depth estimation on microcontrollers is feasible. To the best of our knowledge, our proposal is the first one enabling such remarkable achievement, paving the way for the deployment of monocular depth cues onto the tiny end-nodes of distributed sensor networks.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
Peluso, Valentino; Cipolletta, Antonio; Calimera, Andrea; Poggi, Matteo; Tosi, Fabio; Aleotti, Filippo; Mattoccia, Stefano
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/819845
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact