MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing image editing datasets lack high-quality image pairs exhibiting realistic, physically plausible motion transformations, hindering the development of action-centered editing methods. To address this, we introduce MotionEdit—the first high-fidelity paired image dataset explicitly designed for action editing—and formally define the novel task of *action-centered image editing*: precisely modifying subject pose and interaction while preserving identity consistency, structural coherence, and physical plausibility. We further construct MotionEdit-Bench, a comprehensive multi-dimensional evaluation benchmark. Additionally, we propose MotionNFT, a motion-guided negative-feedback fine-tuning framework that leverages optical flow alignment rewards to optimize diffusion models for high-fidelity motion transfer. Extensive experiments on FLUX.1 Kontext and Qwen-Image-Edit demonstrate significant improvements in motion fidelity and editing quality, without compromising general-purpose editing capabilities.

Technology Category

Application Category

📝 Abstract

We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while preserving identity, structure, and physical plausibility. Unlike existing image editing datasets that focus on static appearance changes or contain only sparse, low-quality motion edits, MotionEdit provides high-fidelity image pairs depicting realistic motion transformations extracted and verified from continuous videos. This new task is not only scientifically challenging but also practically significant, powering downstream applications such as frame-controlled video synthesis and animation. To evaluate model performance on the novel task, we introduce MotionEdit-Bench, a benchmark that challenges models on motion-centric edits and measures model performance with generative, discriminative, and preference-based metrics. Benchmark results reveal that motion editing remains highly challenging for existing state-of-the-art diffusion-based editing models. To address this gap, we propose MotionNFT (Motion-guided Negative-aware Fine Tuning), a post-training framework that computes motion alignment rewards based on how well the motion flow between input and model-edited images matches the ground-truth motion, guiding models toward accurate motion transformations. Extensive experiments on FLUX.1 Kontext and Qwen-Image-Edit show that MotionNFT consistently improves editing quality and motion fidelity of both base models on the motion editing task without sacrificing general editing ability, demonstrating its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Introduces a dataset for motion-centric image editing tasks.

Proposes a benchmark to evaluate model performance on motion edits.

Develops a framework to improve motion fidelity in image editing.

Innovation

Methods, ideas, or system contributions that make the work stand out.

MotionEdit dataset for realistic motion transformations

MotionEdit-Bench benchmark with multi-metric evaluation

MotionNFT framework for motion-aligned fine-tuning

🔎 Similar Papers

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos