From Prompt to Progression: Taming Video Diffusion Models for Seamless Attribute Transition

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video diffusion models struggle to model complex temporal dynamics, particularly when generating videos with gradual attribute transitions (e.g., slow color or shape evolution); mainstream approaches like prompt interpolation often yield inter-frame inconsistency and motion distortion. To address this, we propose a frame-level denoising guidance mechanism that learns data-driven, smooth transition directions in latent space, jointly optimizing for continuous attribute evolution and faithful motion dynamics. Our contributions are threefold: (1) CAT-Bench—the first benchmark dedicated to evaluating attribute transitions—assessing attribute accuracy, transition smoothness, and motion fidelity; (2) Transition Score, a novel metric quantifying transition quality; and (3) comprehensive experiments demonstrating significant improvements over state-of-the-art methods across text alignment, visual fidelity, and transition smoothness.

Technology Category

Application Category

📝 Abstract
Existing models often struggle with complex temporal changes, particularly when generating videos with gradual attribute transitions. The most common prompt interpolation approach for motion transitions often fails to handle gradual attribute transitions, where inconsistencies tend to become more pronounced. In this work, we propose a simple yet effective method to extend existing models for smooth and consistent attribute transitions, through introducing frame-wise guidance during the denoising process. Our approach constructs a data-specific transitional direction for each noisy latent, guiding the gradual shift from initial to final attributes frame by frame while preserving the motion dynamics of the video. Moreover, we present the Controlled-Attribute-Transition Benchmark (CAT-Bench), which integrates both attribute and motion dynamics, to comprehensively evaluate the performance of different models. We further propose two metrics to assess the accuracy and smoothness of attribute transitions. Experimental results demonstrate that our approach performs favorably against existing baselines, achieving visual fidelity, maintaining alignment with text prompts, and delivering seamless attribute transitions. Code and CATBench are released: https://github.com/lynn-ling-lo/Prompt2Progression.
Problem

Research questions and friction points this paper is trying to address.

Handling gradual attribute transitions in video generation models
Addressing inconsistencies in temporal changes during video synthesis
Improving smoothness and consistency of attribute progression over time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frame-wise guidance during denoising process
Data-specific transitional direction for each latent
Controlled-Attribute-Transition Benchmark for evaluation
🔎 Similar Papers
No similar papers found.