Single-image depth estimation represents a longstanding challenge in computer vision and although it is an ill-posed problem, deep learning enabled astonishing results leveraging both supervised and self-supervised training paradigms. State-of-the-art solutions achieve remarkably accurate depth estimation from a single image deploying huge deep architectures, requiring powerful dedicated hardware to run in a reasonable amount of time. This overly demanding complexity makes them unsuited for a broad category of applications requiring devices with constrained resources or memory consumption. To tackle this issue, in this paper a family of compact, yet effective CNNs for monocular depth estimation is proposed, by leveraging self-supervision from a binocular stereo rig. Our lightweight architectures, namely PyD-Net and PyD-Net2, compared to complex state-of-the-art trade a small drop in accuracy to drastically reduce the runtime and memory requirements by a factor ranging from 2× to 100×. Moreover, our networks can run real-time monocular depth estimation on a broad set of embedded or consumer devices, even not equipped with a GPU, by early stopping the inference with negligible (or no) loss in accuracy, making it ideally suited for real applications with strict constraints on hardware resources or power consumption.

Real-Time Self-Supervised Monocular Depth Estimation Without GPU

Poggi, Matteo;Tosi, Fabio;Aleotti, Filippo;Mattoccia, Stefano
2022

Abstract

Single-image depth estimation represents a longstanding challenge in computer vision and although it is an ill-posed problem, deep learning enabled astonishing results leveraging both supervised and self-supervised training paradigms. State-of-the-art solutions achieve remarkably accurate depth estimation from a single image deploying huge deep architectures, requiring powerful dedicated hardware to run in a reasonable amount of time. This overly demanding complexity makes them unsuited for a broad category of applications requiring devices with constrained resources or memory consumption. To tackle this issue, in this paper a family of compact, yet effective CNNs for monocular depth estimation is proposed, by leveraging self-supervision from a binocular stereo rig. Our lightweight architectures, namely PyD-Net and PyD-Net2, compared to complex state-of-the-art trade a small drop in accuracy to drastically reduce the runtime and memory requirements by a factor ranging from 2× to 100×. Moreover, our networks can run real-time monocular depth estimation on a broad set of embedded or consumer devices, even not equipped with a GPU, by early stopping the inference with negligible (or no) loss in accuracy, making it ideally suited for real applications with strict constraints on hardware resources or power consumption.
Poggi, Matteo; Tosi, Fabio; Aleotti, Filippo; Mattoccia, Stefano
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/878689
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact