Self-supervised representation learning extracts meaningful features from data without explicit supervision, building a space with desired properties. Contrastive learning has emerged as the predominant approach to clustering similar data points and separating dissimilar ones within the embedding space. Although creating different views of the same data (e.g., cropping, rotation) emphasizes similarities without labels, current methods struggle to define negative examples. Several algorithms only consider positive examples or integrate dissimilarity measures into their loss functions by computing average distances within the same batch. However, they do not capture nuanced differences effectively, risking collapsing data points in a single location. In this paper, we propose a novel technique, termed ``Refined Triplet Sampling'' (ReTSam), to generate synthetic negative vectors for contrastive learning. Mechanically, for each element in the batch, we identify its k-nearest neighbors and designate the centroid as a hard negative for a triplet loss methodology. We test ReTSam on two widely used image datasets, namely CIFAR-10 and SVHN, considering content-based image retrieval and classification tasks. Our findings demonstrate that, despite its simplicity, ReTSam not only promotes the learning of similarity but also significantly improves that of dissimilarity (with a +5% increase in Mean Average Precision on CIFAR10), resulting in superior performance in practical scenarios.
Goyo, M., Frisoni, G., Moro, G., Sartori, C. (2024). Refining Triplet Sampling for Improved Self-Supervised Representation Learning.
Refining Triplet Sampling for Improved Self-Supervised Representation Learning
Giacomo FrisoniCo-primo
;Gianluca MoroCo-primo
;Claudio SartoriCo-primo
2024
Abstract
Self-supervised representation learning extracts meaningful features from data without explicit supervision, building a space with desired properties. Contrastive learning has emerged as the predominant approach to clustering similar data points and separating dissimilar ones within the embedding space. Although creating different views of the same data (e.g., cropping, rotation) emphasizes similarities without labels, current methods struggle to define negative examples. Several algorithms only consider positive examples or integrate dissimilarity measures into their loss functions by computing average distances within the same batch. However, they do not capture nuanced differences effectively, risking collapsing data points in a single location. In this paper, we propose a novel technique, termed ``Refined Triplet Sampling'' (ReTSam), to generate synthetic negative vectors for contrastive learning. Mechanically, for each element in the batch, we identify its k-nearest neighbors and designate the centroid as a hard negative for a triplet loss methodology. We test ReTSam on two widely used image datasets, namely CIFAR-10 and SVHN, considering content-based image retrieval and classification tasks. Our findings demonstrate that, despite its simplicity, ReTSam not only promotes the learning of similarity but also significantly improves that of dissimilarity (with a +5% increase in Mean Average Precision on CIFAR10), resulting in superior performance in practical scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.