MASSeg : 2nd Technical Report for 4th PVUW MOSE Track

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Addressing three core challenges in complex video object segmentation—detection of small/low-resolution objects, robustness to severe occlusion, and dynamic scene modeling—this paper proposes the MASSeg framework and the MOSE+ enhanced dataset. Methodologically: (1) we introduce MOSE+, the first benchmark explicitly designed for occlusion-aware and motion-aware video segmentation; (2) we propose inter-frame consistency-aware data augmentation to explicitly model both consistent and inconsistent motion patterns; and (3) we devise an adaptive mask scaling inference mechanism that dynamically aligns mask resolution with multi-scale object sizes. The approach integrates multi-scale feature aggregation, cross-frame consistency regularization, and hybrid augmentation strategies. On the MOSE test set, MASSeg achieves J=0.8250, F=0.9007, and J&F=0.8628, securing second place in the MOSE track of the CVPR 2025 PVUW Challenge.

Technology Category

Application Category

📝 Abstract

Complex video object segmentation continues to face significant challenges in small object recognition, occlusion handling, and dynamic scene modeling. This report presents our solution, which ranked second in the MOSE track of CVPR 2025 PVUW Challenge. Based on an existing segmentation framework, we propose an improved model named MASSeg for complex video object segmentation, and construct an enhanced dataset, MOSE+, which includes typical scenarios with occlusions, cluttered backgrounds, and small target instances. During training, we incorporate a combination of inter-frame consistent and inconsistent data augmentation strategies to improve robustness and generalization. During inference, we design a mask output scaling strategy to better adapt to varying object sizes and occlusion levels. As a result, MASSeg achieves a J score of 0.8250, F score of 0.9007, and a J&F score of 0.8628 on the MOSE test set.

Problem

Research questions and friction points this paper is trying to address.

Improving small object recognition in video segmentation

Enhancing occlusion handling in dynamic scenes

Developing robust models for cluttered backgrounds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced dataset MOSE+ with occlusion scenarios

Inter-frame consistent and inconsistent augmentation

Mask output scaling for varying objects

🔎 Similar Papers

No similar papers found.