TransVDM: Motion-Constrained Video Diffusion Model for Transparent Video Synthesis

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing video diffusion models (VDMs) lack explicit modeling of transparency (alpha channels), limiting the quality of transparent video generation. To address this, we propose the first transparent video diffusion framework, comprising: (1) a Transparent Variational Autoencoder (TVAE) that disentangles alpha and RGB representations; (2) an Alpha Motion Constraint Module (AMCM) that enforces temporal consistency in transparent regions via motion-guided regularization; and (3) the first large-scale transparent video dataset (250K frames). Integrated into a pre-trained UNet-based diffusion backbone, our framework significantly improves alpha prediction accuracy and structural-motion coherence. Extensive experiments demonstrate superior performance over state-of-the-art methods across multiple benchmarks, effectively suppressing artifacts and alpha drift in transparent regions.

Technology Category

Application Category

📝 Abstract

Recent developments in Video Diffusion Models (VDMs) have demonstrated remarkable capability to generate high-quality video content. Nonetheless, the potential of VDMs for creating transparent videos remains largely uncharted. In this paper, we introduce TransVDM, the first diffusion-based model specifically designed for transparent video generation. TransVDM integrates a Transparent Variational Autoencoder (TVAE) and a pretrained UNet-based VDM, along with a novel Alpha Motion Constraint Module (AMCM). The TVAE captures the alpha channel transparency of video frames and encodes it into the latent space of the VDMs, facilitating a seamless transition to transparent video diffusion models. To improve the detection of transparent areas, the AMCM integrates motion constraints from the foreground within the VDM, helping to reduce undesirable artifacts. Moreover, we curate a dataset containing 250K transparent frames for training. Experimental results demonstrate the effectiveness of our approach across various benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Transparent video synthesis

Diffusion-based model

Alpha Motion Constraint Module

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transparent Variational Autoencoder integration

Alpha Motion Constraint Module application

Pretrained UNet-based VDM utilization

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence