🤖 AI Summary
Existing video diffusion models struggle to model complex temporal dynamics, particularly when generating videos with gradual attribute transitions (e.g., slow color or shape evolution); mainstream approaches like prompt interpolation often yield inter-frame inconsistency and motion distortion. To address this, we propose a frame-level denoising guidance mechanism that learns data-driven, smooth transition directions in latent space, jointly optimizing for continuous attribute evolution and faithful motion dynamics. Our contributions are threefold: (1) CAT-Bench—the first benchmark dedicated to evaluating attribute transitions—assessing attribute accuracy, transition smoothness, and motion fidelity; (2) Transition Score, a novel metric quantifying transition quality; and (3) comprehensive experiments demonstrating significant improvements over state-of-the-art methods across text alignment, visual fidelity, and transition smoothness.
📝 Abstract
Existing models often struggle with complex temporal changes, particularly when generating videos with gradual attribute transitions. The most common prompt interpolation approach for motion transitions often fails to handle gradual attribute transitions, where inconsistencies tend to become more pronounced. In this work, we propose a simple yet effective method to extend existing models for smooth and consistent attribute transitions, through introducing frame-wise guidance during the denoising process. Our approach constructs a data-specific transitional direction for each noisy latent, guiding the gradual shift from initial to final attributes frame by frame while preserving the motion dynamics of the video. Moreover, we present the Controlled-Attribute-Transition Benchmark (CAT-Bench), which integrates both attribute and motion dynamics, to comprehensively evaluate the performance of different models. We further propose two metrics to assess the accuracy and smoothness of attribute transitions. Experimental results demonstrate that our approach performs favorably against existing baselines, achieving visual fidelity, maintaining alignment with text prompts, and delivering seamless attribute transitions. Code and CATBench are released: https://github.com/lynn-ling-lo/Prompt2Progression.