RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

πŸ“… 2025-05-19
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the weak motion controllability and poor text-action alignment in diffusion-based video generation. We propose a training-free video motion transfer method that repurposes Rotary Position Embedding (RoPE) β€” originally designed for sequence modeling in diffusion Transformers β€” as an explicit, differentiable motion carrier. Specifically, optical flow is extracted from a reference video to obtain motion trajectories; these trajectories are then deformed via RoPE tensor modulation. During denoising, we jointly optimize a trajectory alignment loss and a Fourier-phase regularization term to ensure precise motion injection and spatiotemporal consistency. To our knowledge, this is the first approach to leverage RoPE for explicit, differentiable motion modeling, effectively eliminating duplicated frames and high-frequency artifacts. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, with significant improvements in quantitative metrics. Generated videos exhibit natural motion dynamics, high text fidelity, and strong temporal coherence.

Technology Category

Application Category

πŸ“ Abstract
We propose RoPECraft, a training-free video motion transfer method for diffusion transformers that operates solely by modifying their rotary positional embeddings (RoPE). We first extract dense optical flow from a reference video, and utilize the resulting motion offsets to warp the complex-exponential tensors of RoPE, effectively encoding motion into the generation process. These embeddings are then further optimized during denoising time steps via trajectory alignment between the predicted and target velocities using a flow-matching objective. To keep the output faithful to the text prompt and prevent duplicate generations, we incorporate a regularization term based on the phase components of the reference video's Fourier transform, projecting the phase angles onto a smooth manifold to suppress high-frequency artifacts. Experiments on benchmarks reveal that RoPECraft outperforms all recently published methods, both qualitatively and quantitatively.
Problem

Research questions and friction points this paper is trying to address.

Training-free video motion transfer for diffusion transformers
Encoding motion via rotary positional embeddings optimization
Maintaining text prompt fidelity while preventing duplicate generations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modifies rotary positional embeddings for motion transfer
Optimizes embeddings via trajectory alignment during denoising
Uses Fourier phase regularization to prevent artifacts
πŸ”Ž Similar Papers
No similar papers found.