Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.

Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].

Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling

Helal, Sumi
Membro del Collaboration Group
;
2026

Abstract

Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.
2026
Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].
Abdein, Rokia; Li, Wei; Chen, Yidan; Li, Chenghao; Helal, Sumi; Youssef, Moustafa
File in questo prodotto:
File Dimensione Formato  
NN-draft.pdf

embargo fino al 23/02/2027

Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 2.04 MB
Formato Adobe PDF
2.04 MB Adobe PDF   Visualizza/Apri   Contatta l'autore
1-s2.0-S0893608026002339-main.compressed.pdf

accesso riservato

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per accesso riservato
Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1051859
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact