People tracking is a crucial task in most computer vision applications aimed at analyzing specific behaviors in the sensed area. Practical applications include vision analytics, people counting, etc. In order to properly follow the actions of a single subject, a people tracking framework needs to robustly recognize it from the rest of the surrounding environment, thus allowing proper management of changing positions, occlusions and so on. The recent widespread diffusion of deep learning techniques on almost any kind of computer vision application provides a powerful methodology to address recognition. On the other hand, a large amount of data is required to train state-of-the-art Convolutional Neural Networks (CNN) and this problem is solved, when possible, by means of transfer learning. In this paper, we propose a novel dataset made of nearly 26 thousand samples acquired with a custom stereo camera providing depth according to a fast and accurate stereo algorithm. The dataset includes sequences acquired in different environments with more than 20 different people moving across the sensed area. Once labeled the 26 K images and depth maps of the dataset, we train a head detection module based on state-of-the-art deep network on a portion of the dataset and validate it a different sequence. Finally, we include the head detection module within an existing 3D tracking framework showing that the proposed approach notably improves people detection and tracking accuracy.
Boschini, M., Poggi, M., Mattoccia, S. (2016). Improving the reliability of 3D people tracking system by means of deep-learning. Institute of Electrical and Electronics Engineers Inc. [10.1109/IC3D.2016.7823454].
Improving the reliability of 3D people tracking system by means of deep-learning
POGGI, MATTEO;MATTOCCIA, STEFANO
2016
Abstract
People tracking is a crucial task in most computer vision applications aimed at analyzing specific behaviors in the sensed area. Practical applications include vision analytics, people counting, etc. In order to properly follow the actions of a single subject, a people tracking framework needs to robustly recognize it from the rest of the surrounding environment, thus allowing proper management of changing positions, occlusions and so on. The recent widespread diffusion of deep learning techniques on almost any kind of computer vision application provides a powerful methodology to address recognition. On the other hand, a large amount of data is required to train state-of-the-art Convolutional Neural Networks (CNN) and this problem is solved, when possible, by means of transfer learning. In this paper, we propose a novel dataset made of nearly 26 thousand samples acquired with a custom stereo camera providing depth according to a fast and accurate stereo algorithm. The dataset includes sequences acquired in different environments with more than 20 different people moving across the sensed area. Once labeled the 26 K images and depth maps of the dataset, we train a head detection module based on state-of-the-art deep network on a portion of the dataset and validate it a different sequence. Finally, we include the head detection module within an existing 3D tracking framework showing that the proposed approach notably improves people detection and tracking accuracy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.