Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.
Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].
Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling
Li, WeiMembro del Collaboration Group
;Helal, SumiMembro del Collaboration Group
;
2026
Abstract
Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


