🤖 AI Summary
This work addresses the challenge of controllably editing the motion trajectory of a target object in videos while preserving the original scene content. To this end, the authors propose a two-stage framework: first, a cross-view motion transformation module maps a user-specified trajectory—provided only in the initial frame—into per-frame bounding boxes that account for camera motion; second, a motion-conditioned video resynthesis module generates the object along this trajectory while maintaining background consistency. By eliminating the need for complex point-trajectory inputs, the method significantly enhances user-friendliness and temporal coherence. Experiments demonstrate that the approach produces more realistic, temporally consistent, and controllable motion edits on diverse real-world videos compared to existing image-to-video or video-to-video methods.
📝 Abstract
We study object motion path editing in videos, where the goal is to alter a target object's trajectory while preserving the original scene content. Unlike prior video editing methods that primarily manipulate appearance or rely on point-track-based trajectory control, which is often challenging for users to provide during inference, especially in videos with camera motion, we offer a practical, easy-to-use approach to controllable object-centric motion editing. We present Trace, a framework that enables users to design the desired trajectory in a single anchor frame and then synthesizes a temporally consistent edited video. Our approach addresses this task with a two-stage pipeline: a cross-view motion transformation module that maps first-frame path design to frame-aligned box trajectories under camera motion, and a motion-conditioned video re-synthesis module that follows these trajectories to regenerate the object while preserving the remaining content of the input video. Experiments on diverse real-world videos show that our method produces more coherent, realistic, and controllable motion edits than recent image-to-video and video-to-video methods.