Confidence measures for stereo gained popularity in recent years due to their improved capability to detect outliers and the increasing number of applications exploiting these cues. In this field, convolutional neural networks achieved top-performance compared to other known techniques in the literature by processing local information to tell disparity assignments from outliers. Despite this outstanding achievements, all approaches rely on clues extracted with small receptive fields thus ignoring most of the overall image content. Therefore, in this paper, we propose to exploit nearby and farther clues available from image and disparity domains to obtain a more accurate confidence estimation. While local information is very effective for detecting high frequency patterns, it lacks insights from farther regions in the scene. On the other hand, enlarging the receptive field allows to include clues from farther regions but produces smoother uncertainty estimation, not particularly accurate when dealing with high frequency patterns. For these reasons, we propose in this paper a multi-stage cascaded network to combine the best of the two worlds. Extensive experiments on three datasets using three popular stereo algorithms prove that the proposed framework outperforms state-of-the-art confidence estimation techniques.
Tosi, F., Poggi, M., Benincasa, A., Mattoccia, S. (2018). Beyond local reasoning for stereo confidence estimation with deep learning. Springer Verlag [10.1007/978-3-030-01231-1_20].
Beyond local reasoning for stereo confidence estimation with deep learning
Tosi, Fabio;Poggi, Matteo;Mattoccia, Stefano
2018
Abstract
Confidence measures for stereo gained popularity in recent years due to their improved capability to detect outliers and the increasing number of applications exploiting these cues. In this field, convolutional neural networks achieved top-performance compared to other known techniques in the literature by processing local information to tell disparity assignments from outliers. Despite this outstanding achievements, all approaches rely on clues extracted with small receptive fields thus ignoring most of the overall image content. Therefore, in this paper, we propose to exploit nearby and farther clues available from image and disparity domains to obtain a more accurate confidence estimation. While local information is very effective for detecting high frequency patterns, it lacks insights from farther regions in the scene. On the other hand, enlarging the receptive field allows to include clues from farther regions but produces smoother uncertainty estimation, not particularly accurate when dealing with high frequency patterns. For these reasons, we propose in this paper a multi-stage cascaded network to combine the best of the two worlds. Extensive experiments on three datasets using three popular stereo algorithms prove that the proposed framework outperforms state-of-the-art confidence estimation techniques.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.