Real-Time Self-Supervised Monocular Depth Estimation Without GPU

Poggi, Matteo; Tosi, Fabio; Aleotti, Filippo; Mattoccia, Stefano

doi:10.1109/TITS.2022.3157265

Single-image depth estimation represents a longstanding challenge in computer vision and although it is an ill-posed problem, deep learning enabled astonishing results leveraging both supervised and self-supervised training paradigms. State-of-the-art solutions achieve remarkably accurate depth estimation from a single image deploying huge deep architectures, requiring powerful dedicated hardware to run in a reasonable amount of time. This overly demanding complexity makes them unsuited for a broad category of applications requiring devices with constrained resources or memory consumption. To tackle this issue, in this paper a family of compact, yet effective CNNs for monocular depth estimation is proposed, by leveraging self-supervision from a binocular stereo rig. Our lightweight architectures, namely PyD-Net and PyD-Net2, compared to complex state-of-the-art trade a small drop in accuracy to drastically reduce the runtime and memory requirements by a factor ranging from 2× to 100×. Moreover, our networks can run real-time monocular depth estimation on a broad set of embedded or consumer devices, even not equipped with a GPU, by early stopping the inference with negligible (or no) loss in accuracy, making it ideally suited for real applications with strict constraints on hardware resources or power consumption.

Poggi, M., Tosi, F., Aleotti, F., Mattoccia, S. (2022). Real-Time Self-Supervised Monocular Depth Estimation Without GPU. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, Early access, in fase di pubblicazione, 1-12 [10.1109/TITS.2022.3157265].