Vehicle viewpoint estimation from monocular images is a crucial component for autonomous driving vehicles and for fleet management applications. In this paper, we make several contributions to advance the state-of-the-art on this problem. We show the effectiveness of applying a smoothing filter to the output neurons of a Convolutional Neural Network (CNN) when estimating vehicle viewpoint. We point out the overlooked fact that, under the same viewpoint, the appearance of a vehicle is strongly influenced by its position in the image plane, which renders viewpoint estimation from appearance an ill-posed problem. We show how, by inserting in the model a CoordConv layer to provide the coordinates of the vehicle, we are able to solve such ambiguity and greatly increase performance. Finally, we introduce a new data augmentation technique that improves viewpoint estimation on vehicles that are closer to the camera or partially occluded. All these improvements let a lightweight CNN reach optimal results while keeping inference time low. An extensive evaluation on a viewpoint estimation benchmark and on actual vehicle camera data shows that our method significantly outperforms the state-of-the-art in vehicle viewpoint estimation, both in terms of accuracy and memory footprint.
Lightweight and Effective Convolutional Neural Networks for Vehicle Viewpoint Estimation From Monocular Images
Boschi, MarcoCo-primo
;Luigi, Luca DePenultimo
;Salti, SamueleUltimo
2022
Abstract
Vehicle viewpoint estimation from monocular images is a crucial component for autonomous driving vehicles and for fleet management applications. In this paper, we make several contributions to advance the state-of-the-art on this problem. We show the effectiveness of applying a smoothing filter to the output neurons of a Convolutional Neural Network (CNN) when estimating vehicle viewpoint. We point out the overlooked fact that, under the same viewpoint, the appearance of a vehicle is strongly influenced by its position in the image plane, which renders viewpoint estimation from appearance an ill-posed problem. We show how, by inserting in the model a CoordConv layer to provide the coordinates of the vehicle, we are able to solve such ambiguity and greatly increase performance. Finally, we introduce a new data augmentation technique that improves viewpoint estimation on vehicles that are closer to the camera or partially occluded. All these improvements let a lightweight CNN reach optimal results while keeping inference time low. An extensive evaluation on a viewpoint estimation benchmark and on actual vehicle camera data shows that our method significantly outperforms the state-of-the-art in vehicle viewpoint estimation, both in terms of accuracy and memory footprint.File | Dimensione | Formato | |
---|---|---|---|
2022_T_ITS_Transaction__Viewpoint_Estimation_compressed.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
3.93 MB
Formato
Adobe PDF
|
3.93 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.