CRIS Current Research Information System

Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.

Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].

Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling

Li, Wei^{Membro del Collaboration Group};Helal, Sumi^{Membro del Collaboration Group};

2026

Abstract

Understanding motion and 3D structure from dynamic scenes is a fundamental computer vision challenge. Unsupervised learning addresses the high cost of annotation by training without manual labels; within this domain, self-supervised learning offers a distinct advantage by creating supervisory signals from the data’s inherent structure. While such methods avoid expensive labeling, they struggle in regions of occlusion, texture ambiguity, or non-rigid motion. To better leverage the geometric synergy between motion and structure, prior joint-learning frameworks treat these challenges with separate heuristics or use geometric constraints as simple binary masks. This paper introduces a new paradigm that reframes these issues as a unified problem of uncertainty estimation, driven by a novel principle: leveraging task inconsistency as a supervisory signal. We propose UGFD, a self-supervised Uncertainty Guided Flow and Depth estimation framework, that derives a dense uncertainty map by explicitly modeling two sources of conflict: (1) intra-task inconsistencies from local gradient disagreements and (2) inter-task inconsistencies from violations of the rigidity assumption between estimated optical flow and depth-induced scene motion. This learned uncertainty is not merely for masking but actively guides learning. Our novel Context-Aware Uncertainty (CAU) module uses this signal to prevent error propagation, while our Unrigidity-Driven (URD) loss dynamically focuses optimization on areas of high ambiguity. By unifying the handling of diverse error sources under a consistent uncertainty framework, our model learns to assess its confidence and perform robust estimation without ground truth. Extensive evaluations on KITTI benchmarks show state-of-the-art performance, while zero-shot tests on Sintel and FlyingThings3D demonstrate robust generalization.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				NEURAL NETWORKS
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.neunet.2026.108771
			
	Citazione
	
				Abdein, R., Li, W., Chen, Y., Li, C., Helal, S., Youssef, M. (2026). Self-Supervised joint flow and depth estimation via Multi-Cue uncertainty modeling. NEURAL NETWORKS, 199, 1-13 [10.1016/j.neunet.2026.108771].
			
	Tutti gli autori
	
						Abdein, Rokia; Li, Wei; Chen, Yidan; Li, Chenghao; Helal, Sumi; Youssef, Moustafa

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1051859

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact