🤖 AI Summary
Existing inversion-free video editing methods struggle to maintain spatiotemporal consistency in multi-object or long-duration scenarios due to instability of editing signals in the latent space. This work proposes a training-free framework that, for the first time, identifies this instability as stemming from drift of editing signals in the high-dimensional latent space and addresses it through two core mechanisms: spatially aware attention refinement to align textual guidance with corresponding spatial regions, and adaptive magnitude modulation to stabilize editing intensity and steer the flow model toward the target distribution. Experiments demonstrate that the proposed approach significantly improves editing fidelity, temporal coherence, and computational efficiency, particularly in complex scenes involving multiple interacting objects or rapid motion.
📝 Abstract
We propose FlowAnchor, a training-free framework for stable and efficient inversion-free, flow-based video editing. Inversion-free editing methods have recently shown impressive efficiency and structure preservation in images by directly steering the sampling trajectory with an editing signal. However, extending this paradigm to videos remains challenging, often failing in multi-object scenes or with increased frame counts. We identify the root cause as the instability of the editing signal in high-dimensional video latent spaces, which arises from imprecise spatial localization and length-induced magnitude attenuation. To overcome this challenge, FlowAnchor explicitly anchors both where to edit and how strongly to edit. It introduces Spatial-aware Attention Refinement, which enforces consistent alignment between textual guidance and spatial regions, and Adaptive Magnitude Modulation, which adaptively preserves sufficient editing strength. Together, these mechanisms stabilize the editing signal and guide the flow-based evolution toward the desired target distribution. Extensive experiments demonstrate that FlowAnchor achieves more faithful, temporally coherent, and computationally efficient video editing across challenging multi-object and fast-motion scenarios. The project page is available at https://cuc-mipg.github.io/FlowAnchor.github.io/.