VideoMaMa: Mask-Guided Video Matting via Generative Prior

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenges of poor generalization and scarce annotated data in video matting under real-world conditions by proposing a zero-shot video matting method that, for the first time, integrates generative priors from pretrained video diffusion models with coarse segmentation cues to produce high-fidelity alpha mattes. To facilitate research and evaluation, the authors introduce a scalable pseudo-labeling pipeline and release MA-V, a large-scale real-world video matting dataset comprising over 50,000 high-quality annotated videos. Leveraging this dataset, they fine-tune SAM2 to obtain SAM2-Matte, which significantly outperforms existing approaches on in-the-wild video sequences.

Technology Category

Application Category

📝 Abstract

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting datasets in terms of robustness on in-the-wild videos. These findings emphasize the importance of large-scale pseudo-labeled video matting and showcase how generative priors and accessible segmentation cues can drive scalable progress in video matting research.

Problem

Research questions and friction points this paper is trying to address.

video matting

generalization

labeled data scarcity

real-world videos

alpha matte

Innovation

Methods, ideas, or system contributions that make the work stand out.

video matting

generative prior

zero-shot generalization

pseudo-labeling

diffusion model

🔎 Similar Papers

VideoPrism: A Foundational Visual Encoder for Video Understanding

2024-02-20International Conference on Machine LearningCitations: 30