Vehicle viewpoint estimation from monocular images is a crucial component for autonomous driving vehicles and for fleet management applications. In this paper, we make several contributions to advance the state-of-the-art on this problem. We show the effectiveness of applying a smoothing filter to the output neurons of a Convolutional Neural Network (CNN) when estimating vehicle viewpoint. We point out the overlooked fact that, under the same viewpoint, the appearance of a vehicle is strongly influenced by its position in the image plane, which renders viewpoint estimation from appearance an ill-posed problem. We show how, by inserting in the model a CoordConv layer to provide the coordinates of the vehicle, we are able to solve such ambiguity and greatly increase performance. Finally, we introduce a new data augmentation technique that improves viewpoint estimation on vehicles that are closer to the camera or partially occluded. All these improvements let a lightweight CNN reach optimal results while keeping inference time low. An extensive evaluation on a viewpoint estimation benchmark and on actual vehicle camera data shows that our method significantly outperforms the state-of-the-art in vehicle viewpoint estimation, both in terms of accuracy and memory footprint.

Lightweight and Effective Convolutional Neural Networks for Vehicle Viewpoint Estimation From Monocular Images

Boschi, Marco
Co-primo
;
Luigi, Luca De
Penultimo
;
Salti, Samuele
Ultimo
2022

Abstract

Vehicle viewpoint estimation from monocular images is a crucial component for autonomous driving vehicles and for fleet management applications. In this paper, we make several contributions to advance the state-of-the-art on this problem. We show the effectiveness of applying a smoothing filter to the output neurons of a Convolutional Neural Network (CNN) when estimating vehicle viewpoint. We point out the overlooked fact that, under the same viewpoint, the appearance of a vehicle is strongly influenced by its position in the image plane, which renders viewpoint estimation from appearance an ill-posed problem. We show how, by inserting in the model a CoordConv layer to provide the coordinates of the vehicle, we are able to solve such ambiguity and greatly increase performance. Finally, we introduce a new data augmentation technique that improves viewpoint estimation on vehicles that are closer to the camera or partially occluded. All these improvements let a lightweight CNN reach optimal results while keeping inference time low. An extensive evaluation on a viewpoint estimation benchmark and on actual vehicle camera data shows that our method significantly outperforms the state-of-the-art in vehicle viewpoint estimation, both in terms of accuracy and memory footprint.
2022
Magistri, Simone; Boschi, Marco; Sambo, Francesco; de Andrade, Douglas Coimbra; Simoncini, Matteo; Kubin, Luca; Taccari, Leonardo; Luigi, Luca De; Salti, Samuele
File in questo prodotto:
File Dimensione Formato  
2022_T_ITS_Transaction__Viewpoint_Estimation_compressed.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 3.93 MB
Formato Adobe PDF
3.93 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/905484
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact