In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches. This fact occurs for depth estimation based on either monocular or stereo, with the latter often providing a valid source of self-supervision for the former. In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between the two. Purposely, in order to train deep stereo networks, we distill knowledge through a monocular completion network. This architecture exploits single-image clues and few sparse points, sourced by traditional stereo algorithms, to estimate dense yet accurate disparity maps by means of a consensus mechanism over multiple estimations. We thoroughly evaluate with popular stereo datasets the impact of didifferent supervisory signals showing how stereo networks trained with our paradigm outperform existing self-supervised frameworks. Finally, our proposal achieves notable generalization capabilities dealing with domain shift issues.

Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation / F. Aleotti, F. Tosi, L. Zhang, M. Poggi, S. Mattoccia,. - ELETTRONICO. - 12356:(2020), pp. 614-632. (Intervento presentato al convegno 16th European Conference on Computer Vision (ECCV 2020) tenutosi a Glasgow, UK (Virtual) nel 23-28 August 2020) [10.1007/978-3-030-58621-8_36].

Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation

F. Aleotti;F. Tosi;M. Poggi;S. Mattoccia
2020

Abstract

In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches. This fact occurs for depth estimation based on either monocular or stereo, with the latter often providing a valid source of self-supervision for the former. In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between the two. Purposely, in order to train deep stereo networks, we distill knowledge through a monocular completion network. This architecture exploits single-image clues and few sparse points, sourced by traditional stereo algorithms, to estimate dense yet accurate disparity maps by means of a consensus mechanism over multiple estimations. We thoroughly evaluate with popular stereo datasets the impact of didifferent supervisory signals showing how stereo networks trained with our paradigm outperform existing self-supervised frameworks. Finally, our proposal achieves notable generalization capabilities dealing with domain shift issues.
2020
16th European Conference on Computer Vision (ECCV 2020)
614
632
Reversing the cycle: self-supervised deep stereo through enhanced monocular distillation / F. Aleotti, F. Tosi, L. Zhang, M. Poggi, S. Mattoccia,. - ELETTRONICO. - 12356:(2020), pp. 614-632. (Intervento presentato al convegno 16th European Conference on Computer Vision (ECCV 2020) tenutosi a Glasgow, UK (Virtual) nel 23-28 August 2020) [10.1007/978-3-030-58621-8_36].
F. Aleotti, F. Tosi, L. Zhang, M. Poggi, S. Mattoccia,
File in questo prodotto:
File Dimensione Formato  
ECCV2020___Unsupervised_Stereo_Matching+(14) (2).pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 398.58 kB
Formato Adobe PDF
398.58 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/764279
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? ND
social impact