This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.

Zhou, Y., Sun, G., Li, Y., Xie, G., Benini, L., Konukoglu, E. (2025). When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation. VISUAL INTELLIGENCE, 3, 1-14 [10.1007/s44267-025-00082-1].

When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation

Benini, Luca;
2025

Abstract

This study investigates the application and performance of the Segment Anything Model 2 (SAM2) in the challenging task of video camouflaged object segmentation (VCOS). VCOS involves detecting objects that blend seamlessly in the surroundings for videos due to similar colors and textures and poor light conditions. Compared to the objects in normal scenes, camouflaged objects are much more difficult to detect. SAM2, a video foundation model, has shown potential in various tasks. However, its effectiveness in dynamic camouflaged scenarios remains under-explored. This study presents a comprehensive study on SAM2’s ability in VCOS. First, we assess SAM2’s performance on camouflaged video datasets using different models and prompts (click, box, and mask). Second, we explore the integration of SAM2 with existing multimodal large language models (MLLMs) and VCOS methods. Third, we specifically adapt SAM2 by fine-tuning it on the video camouflaged dataset. Our comprehensive experiments demonstrate that SAM2 has the excellent zero-shot ability to detect camouflaged objects in videos. We also show that this ability could be further improved by specifically adjusting SAM2’s parameters for VCOS.
2025
Zhou, Y., Sun, G., Li, Y., Xie, G., Benini, L., Konukoglu, E. (2025). When SAM2 meets video camouflaged object segmentation: a comprehensive evaluation and adaptation. VISUAL INTELLIGENCE, 3, 1-14 [10.1007/s44267-025-00082-1].
Zhou, Yuli; Sun, Guolei; Li, Yawei; Xie, Guo-Sen; Benini, Luca; Konukoglu, Ender
File in questo prodotto:
File Dimensione Formato  
When SAM2 meets video camouflaged object segmentation a comprehensive evaluation and adaptation.pdf

accesso aperto

Descrizione: versione editoriale
Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 2.44 MB
Formato Adobe PDF
2.44 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1039444
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact