🤖 AI Summary
This work addresses the challenge of object-level motion control in training-free image-to-video generation. The authors propose a fine-tuning-free framework that enables precise object motion editing using only motion priors derived from the first and last frames along with the target frame, eliminating the need for complex inputs such as trajectories, masks, or motion fields. Central to their approach is the novel Attention Consensus Early Seed selection strategy (ACE-Seed), integrated with semantic-guided object insertion, background inpainting, and lightweight attention consensus search to ensure both scene coherence and accurate motion control. The study further introduces the first object motion evaluation benchmark and metrics that explicitly exclude camera motion interference, demonstrating significant performance gains over existing methods on FLF2V-obj and VBench. Additionally, two new benchmarks—S2M-DAVIS and S2M-OMB—are released to support future research.
📝 Abstract
We present Search2Motion, a training-free framework for object-level motion editing in image-to-video generation. Unlike prior methods requiring trajectories, bounding boxes, masks, or motion fields, Search2Motion adopts target-frame-based control, leveraging first-last-frame motion priors to realize object relocation while preserving scene stability without fine-tuning. Reliable target-frame construction is achieved through semantic-guided object insertion and robust background inpainting. We further show that early-step self-attention maps predict object and camera dynamics, offering interpretable user feedback and motivating ACE-Seed (Attention Consensus for Early-step Seed selection), a lightweight search strategy that improves motion fidelity without look-ahead sampling or external evaluators. Noting that existing benchmarks conflate object and camera motion, we introduce S2M-DAVIS and S2M-OMB for stable-camera, object-only evaluation, alongside FLF2V-obj metrics that isolate object artifacts without requiring ground-truth trajectories. Search2Motion consistently outperforms baselines on FLF2V-obj and VBench.