In this article, we propose an augmented reality semiautomatic labeling (ARS), a semiautomatic method which leverages on moving a 2-D camera by means of a robot, proving precise camera tracking, and an augmented reality pen (ARP) to define initial object bounding box, to create large labeled data sets with minimal human intervention. By removing the burden of generating annotated data from humans, we make the deep learning technique applied to computer vision, which typically requires very large data sets, truly automated and reliable. With the ARS pipeline, we created two novel data sets effortlessly, one on electromechanical components (industrial scenario) and other on fruits (daily-living scenario) and trained two state-of-the-art object detectors robustly, based on convolutional neural networks, such as you only look once (YOLO) and single shot detector (SSD). With respect to conventional manual annotation of 1000 frames that takes us slightly more than 10 h, the proposed approach based on ARS allows to annotate 9 sequences of about 35,000 frames in less than 1 h, with a gain factor of about 450. Moreover, both the precision and recall of object detection is increased by about 15% with respect to manual labeling. All our software is available as a robot operating system (ROS) package in a public repository alongside with the novel annotated data sets.

Semiautomatic Labeling for Deep Learning in Robotics

De Gregorio, Daniele
;
Tonioni, Alessio;Palli, Gianluca;Di Stefano, Luigi
2020

Abstract

In this article, we propose an augmented reality semiautomatic labeling (ARS), a semiautomatic method which leverages on moving a 2-D camera by means of a robot, proving precise camera tracking, and an augmented reality pen (ARP) to define initial object bounding box, to create large labeled data sets with minimal human intervention. By removing the burden of generating annotated data from humans, we make the deep learning technique applied to computer vision, which typically requires very large data sets, truly automated and reliable. With the ARS pipeline, we created two novel data sets effortlessly, one on electromechanical components (industrial scenario) and other on fruits (daily-living scenario) and trained two state-of-the-art object detectors robustly, based on convolutional neural networks, such as you only look once (YOLO) and single shot detector (SSD). With respect to conventional manual annotation of 1000 frames that takes us slightly more than 10 h, the proposed approach based on ARS allows to annotate 9 sequences of about 35,000 frames in less than 1 h, with a gain factor of about 450. Moreover, both the precision and recall of object detection is increased by about 15% with respect to manual labeling. All our software is available as a robot operating system (ROS) package in a public repository alongside with the novel annotated data sets.
De Gregorio, Daniele; Tonioni, Alessio; Palli, Gianluca; Di Stefano, Luigi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/702376
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 7
social impact