Event-based cameras, also called silicon retinas, potentially revolutionize computer vision by detecting and reporting significant changes in intensity asynchronous events, offering extended dynamic range, low latency, and low power consumption, enabling a wide range of applications from autonomous driving to longtime surveillance. As an emerging technology, there is a notable scarcity of publicly available datasets for event-based systems that also feature frame-based cameras, in order to exploit the benefits of both technologies. This work quantitatively evaluates a multi-modal camera setup for fusing high-resolution dynamic vision sensor (DVS) data with RGB image data by static camera alignment. The proposed setup, which is intended for semi-automatic DVS data labeling, combines two recently released Prophesee EVK4 DVS cameras and one global shutter XIMEA MQ022CG-CM RGB camera. After alignment, state-of-the-art object detection or segmentation networks label the image data by mapping boundary boxes or labeled pixels directly to the aligned events. To facilitate this process, various time-based synchronization methods for DVS data are analyzed, and calibration accuracy, camera alignment, and lens impact are evaluated. Experimental results demonstrate the benefits of the proposed system: the best synchronization method yields an image calibration error of less than 0.90 px and a pixel cross-correlation deviation of 1.6 px, while a lens with 8mm focal length enables detection of objects with size 30 cm at a distance of 350m against homogeneous background.
Moosmann, J., Mandula, J., Mayer, P., Benini, L., Magno, M. (2023). Quantitative Evaluation of a Multi-Modal Camera Setup for Fusing Event Data with RGB Images. 345 E 47TH ST, NEW YORK, NY 10017 USA : IEEE [10.1109/SENSORS56945.2023.10325041].
Quantitative Evaluation of a Multi-Modal Camera Setup for Fusing Event Data with RGB Images
Benini, Luca;Magno, Michele
2023
Abstract
Event-based cameras, also called silicon retinas, potentially revolutionize computer vision by detecting and reporting significant changes in intensity asynchronous events, offering extended dynamic range, low latency, and low power consumption, enabling a wide range of applications from autonomous driving to longtime surveillance. As an emerging technology, there is a notable scarcity of publicly available datasets for event-based systems that also feature frame-based cameras, in order to exploit the benefits of both technologies. This work quantitatively evaluates a multi-modal camera setup for fusing high-resolution dynamic vision sensor (DVS) data with RGB image data by static camera alignment. The proposed setup, which is intended for semi-automatic DVS data labeling, combines two recently released Prophesee EVK4 DVS cameras and one global shutter XIMEA MQ022CG-CM RGB camera. After alignment, state-of-the-art object detection or segmentation networks label the image data by mapping boundary boxes or labeled pixels directly to the aligned events. To facilitate this process, various time-based synchronization methods for DVS data are analyzed, and calibration accuracy, camera alignment, and lens impact are evaluated. Experimental results demonstrate the benefits of the proposed system: the best synchronization method yields an image calibration error of less than 0.90 px and a pixel cross-correlation deviation of 1.6 px, while a lens with 8mm focal length enables detection of objects with size 30 cm at a distance of 350m against homogeneous background.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.