CRIS Current Research Information System

This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.

Zhou, Y., Sun, G., Li, Y., Xie, G., Benini, L., Konukoglu, E. (2025). When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation. VISUAL INTELLIGENCE, 3, 1-14 [10.1007/s44267-025-00082-1].

When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation

Zhou, Yuli;Sun, Guolei;Li, Yawei;Xie, Guo-Sen;Benini, Luca;Konukoglu, Ender

2025

Abstract

This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				VISUAL INTELLIGENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s44267-025-00082-1
			
	Citazione
	
				Zhou, Y., Sun, G., Li, Y., Xie, G., Benini, L., Konukoglu, E. (2025). When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation. VISUAL INTELLIGENCE, 3, 1-14 [10.1007/s44267-025-00082-1].
			
	Tutti gli autori
	
						Zhou, Yuli; Sun, Guolei; Li, Yawei; Xie, Guo-Sen; Benini, Luca; Konukoglu, Ender
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
When SAM2 meets video camouflaged object segmentation a comprehensive evaluation and adaptation.pdf accesso aperto Descrizione: versione editoriale Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 2.44 MB Formato Adobe PDF Visualizza/Apri	2.44 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1039444

Citazioni

ND

6

4

ND

social impact