PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work proposes PropFly, a novel framework for propagation-based video editing that eliminates the need for pre-collected paired datasets of source videos and edited results—data that are typically expensive and difficult to acquire. PropFly leverages a pre-trained video diffusion model to dynamically synthesize diverse editing supervision signals during training and optimizes a lightweight adapter using a Guided Modulation Flow Matching (GMFM) loss. By integrating classifier-free guidance with one-step denoising latent estimation, the method achieves superior performance across multiple editing tasks, producing high-quality outputs with strong temporal consistency. To the best of our knowledge, this is the first approach to enable training without any pre-built paired data, significantly advancing the practicality and scalability of propagation-based video editing.

Technology Category

Application Category

📝 Abstract

Propagation-based video editing enables precise user control by propagating a single edited frame into following frames while maintaining the original context such as motion and structures. However, training such models requires large-scale, paired (source and edited) video datasets, which are costly and complex to acquire. Hence, we propose the PropFly, a training pipeline for Propagation-based video editing, relying on on-the-Fly supervision from pre-trained video diffusion models (VDMs) instead of requiring off-the-shelf or precomputed paired video editing datasets. Specifically, our PropFly leverages one-step clean latent estimations from intermediate noised latents with varying Classifier-Free Guidance (CFG) scales to synthesize diverse pairs of'source'(low-CFG) and'edited'(high-CFG) latents on-the-fly. The source latent serves as structural information of the video, while the edited latent provides the target transformation for learning propagation. Our pipeline enables an additional adapter attached to the pre-trained VDM to learn to propagate edits via Guidance-Modulated Flow Matching (GMFM) loss, which guides the model to replicate the target transformation. Our on-the-fly supervision ensures the model to learn temporally consistent and dynamic transformations. Extensive experiments demonstrate that our PropFly significantly outperforms the state-of-the-art methods on various video editing tasks, producing high-quality editing results.

Problem

Research questions and friction points this paper is trying to address.

propagation-based video editing

paired video datasets

video diffusion models

temporal consistency

video editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

on-the-fly supervision

video diffusion models

propagation-based video editing