Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.

Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].

Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling

Li, Wei
Membro del Collaboration Group
;
Helal, Sumi
Membro del Collaboration Group
;
2026

Abstract

Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.
2026
Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].
Abdein, Rokia; Li, Wei; Chen, Yidan; Li, Chenghao; Helal, Sumi; Youssef, Moustafa
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1051859
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact