CRIS Current Research Information System

Person re-identification (re-id) is a critical computer vision task aimed at identifying individuals across multiple non-overlapping cameras, with wide-ranging applications in intelligent surveillance systems. Despite recent advances, the domain gap—performance degradation when models encounter unseen datasets—remains a critical challenge. CLIP-based models, leveraging multimodal pre-training, offer potential for mitigating this issue by aligning visual and textual representations. In this study, we provide a comprehensive quantitative analysis of the domain gap in CLIP-based re-id systems across standard benchmarks, including Market-1501, DukeMTMC-reID, MSMT17, and Airport, simulating real-world deployment conditions. We systematically measure the performance of these models in terms of mean average precision (mAP) and Rank-1 accuracy, offering insights into the challenges faced during dataset transitions. Our analysis highlights the specific advantages introduced by CLIP’s visual–textual alignment and evaluates its contribution relative to strong image encoder baselines. Additionally, we evaluate the impact of extending training sets with non-domain-specific data and incorporating random erasing augmentation, achieving an average improvement of +4.3% in mAP and +4.0% in Rank-1 accuracy. Our findings underscore the importance of standardized benchmarks and systematic evaluations for enhancing reproducibility and guiding future research. This work contributes to a deeper understanding of the domain gap in re-id, while highlighting pathways for improving model robustness and generalization in diverse, real-world scenarios.

Asperti, A., Naldi, L., Fiorilla, S. (2025). An Investigation of the Domain Gap in CLIP-Based Person Re-Identification. SENSORS, 25(2), 1-19 [10.3390/s25020363].

An Investigation of the Domain Gap in CLIP-Based Person Re-Identification

Asperti, Andrea;Naldi, Leonardo;Fiorilla, Salvatore

2025

Abstract

Person re-identification (re-id) is a critical computer vision task aimed at identifying individuals across multiple non-overlapping cameras, with wide-ranging applications in intelligent surveillance systems. Despite recent advances, the domain gap—performance degradation when models encounter unseen datasets—remains a critical challenge. CLIP-based models, leveraging multimodal pre-training, offer potential for mitigating this issue by aligning visual and textual representations. In this study, we provide a comprehensive quantitative analysis of the domain gap in CLIP-based re-id systems across standard benchmarks, including Market-1501, DukeMTMC-reID, MSMT17, and Airport, simulating real-world deployment conditions. We systematically measure the performance of these models in terms of mean average precision (mAP) and Rank-1 accuracy, offering insights into the challenges faced during dataset transitions. Our analysis highlights the specific advantages introduced by CLIP’s visual–textual alignment and evaluates its contribution relative to strong image encoder baselines. Additionally, we evaluate the impact of extending training sets with non-domain-specific data and incorporating random erasing augmentation, achieving an average improvement of +4.3% in mAP and +4.0% in Rank-1 accuracy. Our findings underscore the importance of standardized benchmarks and systematic evaluations for enhancing reproducibility and guiding future research. This work contributes to a deeper understanding of the domain gap in re-id, while highlighting pathways for improving model robustness and generalization in diverse, real-world scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				SENSORS
			
	Codice DOI
	
				https://dx.doi.org/10.3390/s25020363
			
	Citazione
	
				Asperti, A., Naldi, L., Fiorilla, S. (2025). An Investigation of the Domain Gap in CLIP-Based Person Re-Identification. SENSORS, 25(2), 1-19 [10.3390/s25020363].
			
	Tutti gli autori
	
						Asperti, Andrea; Naldi, Leonardo; Fiorilla, Salvatore
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
sensors-25-00363-v2.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 9.59 MB Formato Adobe PDF Visualizza/Apri	9.59 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1004188

Citazioni

ND

3

3

social impact