SAMSON: 3rd Place Solution of LSVOS 2025 VOS Challenge

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address challenges in Large-Scale Video Object Segmentation (LSVOS)—including object re-appearance, small-scale instances, severe occlusion, and crowded scenes—this paper proposes an efficient segmentation framework built upon SAM2. The method introduces two key innovations: (1) a long-term memory module to enhance cross-frame object re-identification robustness, and (2) a SAM2Long post-processing strategy that explicitly models long-range temporal dependencies to mitigate error accumulation. By integrating these components, the framework achieves significantly improved segmentation stability without compromising inference efficiency. Evaluated on the MOSE test set, it attains a J&F score of 0.8427, ranking third in the ICCV 2025 LSVOS Challenge. This result validates the effectiveness of synergistically combining long-term memory with explicit long-horizon temporal modeling for robust video object segmentation.

Technology Category

Application Category

📝 Abstract

Large-scale Video Object Segmentation (LSVOS) addresses the challenge of accurately tracking and segmenting objects in long video sequences, where difficulties stem from object reappearance, small-scale targets, heavy occlusions, and crowded scenes. Existing approaches predominantly adopt SAM2-based frameworks with various memory mechanisms for complex video mask generation. In this report, we proposed Segment Anything with Memory Strengthened Object Navigation (SAMSON), the 3rd place solution in the MOSE track of ICCV 2025, which integrates the strengths of stateof-the-art VOS models into an effective paradigm. To handle visually similar instances and long-term object disappearance in MOSE, we incorporate a long-term memorymodule for reliable object re-identification. Additionly, we adopt SAM2Long as a post-processing strategy to reduce error accumulation and enhance segmentation stability in long video sequences. Our method achieved a final performance of 0.8427 in terms of J &F in the test-set leaderboard.

Problem

Research questions and friction points this paper is trying to address.

Accurately tracking objects in long videos with reappearance challenges

Segmenting small targets in crowded scenes with heavy occlusions

Reducing error accumulation for stable segmentation in long sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates long-term memory module for object re-identification

Adopts SAM2Long for post-processing error reduction

Combines state-of-the-art VOS models into effective paradigm

🔎 Similar Papers

No similar papers found.

Authors to Follow