We propose MaskingDepth, a semi-supervised learning framework for monocular depth estimation. MaskingDepth is designed to enforce consistency between the depths obtained from strongly-augmented images and the pseudo-depths derived from weakly-augmented images, which enables mitigating the reliance on large ground-truth depth quantities. In this framework, we leverage uncertainty estimation to only retain high-confident depth predictions from the weakly-augmented branch as pseudo-depths. We also present a novel data augmentation, dubbed K-way disjoint masking, that takes advantage of a naïve token masking strategy as an augmentation, while avoiding its scale ambiguity problem between depths from weakly-and strongly-augmented branches and risk of missing small-scale objects. Experiments on KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component, its robustness to the use of fewer depth-annotated images, and superior performance compared to other state-of-the-art semi-supervised learning methods for monocular depth estimation.
Baek, J., Kim, G., Park, S., An, H., Poggi, M., Kim, S. (2024). MaskingDepth: Masked Consistency Regularization for Semi-Supervised Monocular Depth Estimation. Institute of Electrical and Electronics Engineers Inc. [10.1109/iros58592.2024.10801719].
MaskingDepth: Masked Consistency Regularization for Semi-Supervised Monocular Depth Estimation
Poggi, Matteo;
2024
Abstract
We propose MaskingDepth, a semi-supervised learning framework for monocular depth estimation. MaskingDepth is designed to enforce consistency between the depths obtained from strongly-augmented images and the pseudo-depths derived from weakly-augmented images, which enables mitigating the reliance on large ground-truth depth quantities. In this framework, we leverage uncertainty estimation to only retain high-confident depth predictions from the weakly-augmented branch as pseudo-depths. We also present a novel data augmentation, dubbed K-way disjoint masking, that takes advantage of a naïve token masking strategy as an augmentation, while avoiding its scale ambiguity problem between depths from weakly-and strongly-augmented branches and risk of missing small-scale objects. Experiments on KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component, its robustness to the use of fewer depth-annotated images, and superior performance compared to other state-of-the-art semi-supervised learning methods for monocular depth estimation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


