Kick-started by deployment of the well-known KinectFusion, recent research on the task of RGBD-based dense volume reconstruction has focused on improving different shortcomings of the original algorithm. In this paper we tackle two of them: drift in the camera trajectory caused by the accumulation of small per-frame tracking errors and lack of semantic information within the output of the algorithm. Accordingly, we present an extended KinectFusion pipeline which takes into account per-pixel semantic labels gathered from the input frames. By such clues, we extend the memory structure holding the reconstructed environment so to store per-voxel information on the kinds of object likely to appear in each spatial location. We then take such information into account during the camera localization step to increase the accuracy in the estimated camera trajectory. Thus, we realize a SemanticFusion loop whereby perframe labels help better track the camera and successful tracking enables to consolidate instantaneous semantic observations into a coherent volumetric map.

SemanticFusion: Joint labeling, tracking and mapping

CAVALLARI, TOMMASO;DI STEFANO, LUIGI
2016

Abstract

Kick-started by deployment of the well-known KinectFusion, recent research on the task of RGBD-based dense volume reconstruction has focused on improving different shortcomings of the original algorithm. In this paper we tackle two of them: drift in the camera trajectory caused by the accumulation of small per-frame tracking errors and lack of semantic information within the output of the algorithm. Accordingly, we present an extended KinectFusion pipeline which takes into account per-pixel semantic labels gathered from the input frames. By such clues, we extend the memory structure holding the reconstructed environment so to store per-voxel information on the kinds of object likely to appear in each spatial location. We then take such information into account during the camera localization step to increase the accuracy in the estimated camera trajectory. Thus, we realize a SemanticFusion loop whereby perframe labels help better track the camera and successful tracking enables to consolidate instantaneous semantic observations into a coherent volumetric map.
2016
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
648
664
Cavallari, Tommaso; Di Stefano, Luigi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/589910
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact