STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

To address the challenges of object segmentation in videos exhibiting complex motion and long temporal durations, this paper proposes a pseudo-label-driven adaptive multi-model collaboration framework. Our method establishes a dual-model baseline comprising a fine-tuned SAM2 and an unsupervised video model (TMO), which jointly generate high-quality pseudo-labels during inference. An adaptive model selection mechanism then dynamically assigns the optimal model to each video segment based on these pseudo-labels. Crucially, this is the first approach to enable online, annotation-free model switching, substantially improving segmentation robustness and cross-scenario generalization. Evaluated on the 2025 PVUW MOSE challenge test set, our method achieves a new state-of-the-art J&F score of 87.26%, securing first place in the competition.

Technology Category

Application Category

📝 Abstract

Segmentation of video objects in complex scenarios is highly challenging, and the MOSE dataset has significantly contributed to the development of this field. This technical report details the STSeg solution proposed by the"imaplus"team.By finetuning SAM2 and the unsupervised model TMO on the MOSE dataset, the STSeg solution demonstrates remarkable advantages in handling complex object motions and long-video sequences. In the inference phase, an Adaptive Pseudo-labels Guided Model Refinement Pipeline is adopted to intelligently select appropriate models for processing each video. Through finetuning the models and employing the Adaptive Pseudo-labels Guided Model Refinement Pipeline in the inference phase, the STSeg solution achieved a J&F score of 87.26% on the test set of the 2025 4th PVUW Challenge MOSE Track, securing the 1st place and advancing the technology for video object segmentation in complex scenarios.

Problem

Research questions and friction points this paper is trying to address.

Segmentation of video objects in complex scenarios

Handling complex object motions and long-video sequences

Advancing technology for video object segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Finetuning SAM2 and TMO on MOSE dataset

Adaptive Pseudo-labels Guided Model Refinement

Intelligently selecting models per video

🔎 Similar Papers

No similar papers found.