Bird’s Eye View (BEV) semantic maps have recently garnered a lot of attention as a useful representation of the environment to tackle assisted and autonomous driving tasks. However, most of the existing work focuses on the fully supervised setting, training neural networks on large annotated datasets. In this work, we present RendBEV, a new method to train BEV semantic segmentation networks without direct BEV supervision. We leverage rendering with neural density fields or monocular depth estimation models to shift the supervision to semantic perspective views, where targets can be computed by a 2D semantic segmentation model. Through extensive experimental work on the KITTI-360 and nuScenes datasets, we show that RendBEV enables BEV semantic segmentation with no BEV supervision, and delivers competitive results in this challenging setting. When used as pretraining to then fine-tune on labeled BEV ground truth, our method boosts performance in low-annotation regimes, outperforming models trained from scratch and improving upon competing methods (on nuScenes) or being on-par with them (on KITTI-360).
Monteagudo, H.P., Taccari, L., Pjetri, A., Sambo, F., Salti, S. (2026). RendBEV: Semantic Perspective View Rendering as Supervision for Bird’s Eye View Segmentation. IEEE ACCESS, 14, 12255-12272 [10.1109/access.2026.3656618].
RendBEV: Semantic Perspective View Rendering as Supervision for Bird’s Eye View Segmentation
Monteagudo, Henrique Pineiro
;Salti, Samuele
2026
Abstract
Bird’s Eye View (BEV) semantic maps have recently garnered a lot of attention as a useful representation of the environment to tackle assisted and autonomous driving tasks. However, most of the existing work focuses on the fully supervised setting, training neural networks on large annotated datasets. In this work, we present RendBEV, a new method to train BEV semantic segmentation networks without direct BEV supervision. We leverage rendering with neural density fields or monocular depth estimation models to shift the supervision to semantic perspective views, where targets can be computed by a 2D semantic segmentation model. Through extensive experimental work on the KITTI-360 and nuScenes datasets, we show that RendBEV enables BEV semantic segmentation with no BEV supervision, and delivers competitive results in this challenging setting. When used as pretraining to then fine-tune on labeled BEV ground truth, our method boosts performance in low-annotation regimes, outperforming models trained from scratch and improving upon competing methods (on nuScenes) or being on-par with them (on KITTI-360).| File | Dimensione | Formato | |
|---|---|---|---|
|
RendBEV_Semantic_Perspective_View_Rendering_as_Supervision_for_Birds_Eye_View_Segmentation.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione
3.63 MB
Formato
Adobe PDF
|
3.63 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


