FVOS for MOSE Track of 4th PVUW Challenge: 3rd Place Solution

📅 2025-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low segmentation accuracy and severe inter-frame object fragmentation in video object segmentation (VOS) under complex real-world scenarios, this work proposes the FVOS fine-tuning framework. First, a lightweight domain-adaptive fine-tuning is applied to a pre-trained model. Second, a morphology-based post-processing module—leveraging morphological opening and closing operations—is introduced to explicitly repair intra-frame target discontinuities. Third, a multi-scale feature extraction mechanism coupled with weighted voting fusion is designed to enhance temporal consistency and robustness. Evaluated on the PVUW 2025 MOSE challenge, our method achieves J&F scores of 76.81% on the validation set and 83.92% on the test set, ranking third overall. The framework significantly mitigates segmentation degradation caused by dynamic backgrounds, occlusions, and small objects.

Technology Category

Application Category

📝 Abstract
Video Object Segmentation (VOS) is one of the most fundamental and challenging tasks in computer vision and has a wide range of applications. Most existing methods rely on spatiotemporal memory networks to extract frame-level features and have achieved promising results on commonly used datasets. However, these methods often struggle in more complex real-world scenarios. This paper addresses this issue, aiming to achieve accurate segmentation of video objects in challenging scenes. We propose fine-tuning VOS (FVOS), optimizing existing methods for specific datasets through tailored training. Additionally, we introduce a morphological post-processing strategy to address the issue of excessively large gaps between adjacent objects in single-model predictions. Finally, we apply a voting-based fusion method on multi-scale segmentation results to generate the final output. Our approach achieves J&F scores of 76.81% and 83.92% during the validation and testing stages, respectively, securing third place overall in the MOSE Track of the 4th PVUW challenge 2025.
Problem

Research questions and friction points this paper is trying to address.

Improving video object segmentation in complex real-world scenarios
Optimizing VOS methods via fine-tuning for specific datasets
Reducing gaps between objects using morphological post-processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning VOS for specific datasets optimization
Morphological post-processing for gap reduction
Voting-based fusion on multi-scale results
🔎 Similar Papers
No similar papers found.
M
Mengjiao Wang
International Joint Research Center for Intelligent Perception and Computation, Xi’an, China
J
Junpei Zhang
International Joint Research Center for Intelligent Perception and Computation, Xi’an, China
X
Xu Liu
International Joint Research Center for Intelligent Perception and Computation, Xi’an, China
Y
Yuting Yang
International Joint Research Center for Intelligent Perception and Computation, Xi’an, China
Mengru Ma
Mengru Ma
xidian university
Fusion Classification,Remote Sensing Intelligent Interpretation