Correspondences between 3D keypoints generated by matching local descriptors are a key step in 3D computer vision and graphic applications. Learned descriptors are rapidly evolving and outperforming the classical handcrafted approaches in the field. Yet, to learn effective representations they require supervision through labeled data, which are cumbersome and time-consuming to obtain. Unsupervised alternatives exist, but they lag in performance. Moreover, invariance to viewpoint changes is attained either by relying on data augmentation, which is prone to degrading upon generalization on unseen datasets, or by learning from handcrafted representations of the input which are already rotation invariant but whose effectiveness at training time may significantly affect the learned descriptor. We show how learning an equivariant 3D local descriptor instead of an invariant one can overcome both issues. LEAD (Local EquivAriant Descriptor) combines Spherical CNNs to learn an equivariant representation together with plane-folding decoders to learn without supervision. Through extensive experiments on standard surface registration datasets, we show how our proposal outperforms existing unsupervised methods by a large margin and achieves competitive results against the supervised approaches, especially in the practically very relevant scenario of transfer learning.
Marcon, M., Spezialetti, R., Salti, S., Silva, L., Di Stefano, L. (2022). Unsupervised Learning of Local Equivariant Descriptors for Point Clouds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 44(12), 9687-9702 [10.1109/TPAMI.2021.3126713].
Unsupervised Learning of Local Equivariant Descriptors for Point Clouds
Marcon, Marlon
;Spezialetti, Riccardo
;Salti, Samuele;Di Stefano, Luigi
2022
Abstract
Correspondences between 3D keypoints generated by matching local descriptors are a key step in 3D computer vision and graphic applications. Learned descriptors are rapidly evolving and outperforming the classical handcrafted approaches in the field. Yet, to learn effective representations they require supervision through labeled data, which are cumbersome and time-consuming to obtain. Unsupervised alternatives exist, but they lag in performance. Moreover, invariance to viewpoint changes is attained either by relying on data augmentation, which is prone to degrading upon generalization on unseen datasets, or by learning from handcrafted representations of the input which are already rotation invariant but whose effectiveness at training time may significantly affect the learned descriptor. We show how learning an equivariant 3D local descriptor instead of an invariant one can overcome both issues. LEAD (Local EquivAriant Descriptor) combines Spherical CNNs to learn an equivariant representation together with plane-folding decoders to learn without supervision. Through extensive experiments on standard surface registration datasets, we show how our proposal outperforms existing unsupervised methods by a large margin and achieves competitive results against the supervised approaches, especially in the practically very relevant scenario of transfer learning.File | Dimensione | Formato | |
---|---|---|---|
21_PAMI_Learning_an equivariant_descriptor_postprint-compressed.pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
1.56 MB
Formato
Adobe PDF
|
1.56 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.