FreeVPS: Repurposing Training-Free SAM2 for Generalizable Video Polyp Segmentation

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video polyp segmentation (VPS) methods struggle to jointly optimize spatiotemporal modeling and domain generalization, while SAM2 suffers from error accumulation in long-term colonoscopy tracking, leading to performance degradation. To address these issues, we propose a training-free dual-module VPS framework that reformulates VPS as a “track-as-detect” task. It incorporates inline filtering to suppress false positives and an exline memory update mechanism for adaptive memory refinement—effectively mitigating the snowball effect. Our method deeply leverages SAM2’s spatiotemporal representation capability, achieving both intra-domain and cross-domain generalization solely through inference-time association filtering and memory modulation. Evaluated on untrimmed long colonoscopy videos, it significantly improves segmentation accuracy and temporal consistency, attaining state-of-the-art performance across multiple benchmarks. The framework demonstrates strong clinical deployment potential due to its zero-shot adaptability and computational efficiency.

Technology Category

Application Category

📝 Abstract
Existing video polyp segmentation (VPS) paradigms usually struggle to balance between spatiotemporal modeling and domain generalization, limiting their applicability in real clinical scenarios. To embrace this challenge, we recast the VPS task as a track-by-detect paradigm that leverages the spatial contexts captured by the image polyp segmentation (IPS) model while integrating the temporal modeling capabilities of segment anything model 2 (SAM2). However, during long-term polyp tracking in colonoscopy videos, SAM2 suffers from error accumulation, resulting in a snowball effect that compromises segmentation stability. We mitigate this issue by repurposing SAM2 as a video polyp segmenter with two training-free modules. In particular, the intra-association filtering module eliminates spatial inaccuracies originating from the detecting stage, reducing false positives. The inter-association refinement module adaptively updates the memory bank to prevent error propagation over time, enhancing temporal coherence. Both modules work synergistically to stabilize SAM2, achieving cutting-edge performance in both in-domain and out-of-domain scenarios. Furthermore, we demonstrate the robust tracking capabilities of FreeVPS in long-untrimmed colonoscopy videos, underscoring its potential reliable clinical analysis.
Problem

Research questions and friction points this paper is trying to address.

Balancing spatiotemporal modeling and domain generalization in VPS
Mitigating error accumulation in SAM2 for long-term polyp tracking
Enhancing segmentation stability and temporal coherence without training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Repurposing SAM2 for polyp segmentation
Training-free modules for error reduction
Memory bank updates for temporal coherence
🔎 Similar Papers
No similar papers found.
Q
Qiang Hu
Huazhong University of Science and Technology
Y
Ying Zhou
Huazhong University of Science and Technology
G
Gepeng Ji
Australian National University
Nick Barnes
Nick Barnes
Professor, Australian National University
Computer Vision3D VisionSaliencyProsthetic visioncognitive vision
Q
Qiang Li
Huazhong University of Science and Technology
Z
Zhiwei Wang
Huazhong University of Science and Technology