๐ค AI Summary
Drag-and-drop image editing (DBIE) faces two key challenges: ambiguous point-level drag intentions and poor controllability/quality arising from alternating motion supervision and point tracking in existing methods. To address these, we propose DragNeXt, which reformulates DBIE as explicit deformation, rotation, or translation operations applied to user-specified regionsโthereby eliminating intent ambiguity through precise region definition. We introduce the first formalization of DBIE as a latent region optimization (LRO) problem and design a progressive backward self-intervention (PBSI) mechanism that unifies motion guidance with structural constraints. Furthermore, we incorporate a region-level structural-aware loss and a handle-based interaction paradigm. Evaluated on NextBench, DragNeXt significantly outperforms state-of-the-art methods, achieving more accurate edits, superior structural consistency, and a more concise, efficient workflow.
๐ Abstract
Drag-Based Image Editing (DBIE), which allows users to manipulate images by directly dragging objects within them, has recently attracted much attention from the community. However, it faces two key challenges: (emph{ extcolor{magenta}{i}}) point-based drag is often highly ambiguous and difficult to align with users' intentions; (emph{ extcolor{magenta}{ii}}) current DBIE methods primarily rely on alternating between motion supervision and point tracking, which is not only cumbersome but also fails to produce high-quality results. These limitations motivate us to explore DBIE from a new perspective -- redefining it as deformation, rotation, and translation of user-specified handle regions. Thereby, by requiring users to explicitly specify both drag areas and types, we can effectively address the ambiguity issue. Furthermore, we propose a simple-yet-effective editing framework, dubbed extcolor{SkyBlue}{ extbf{DragNeXt}}. It unifies DBIE as a Latent Region Optimization (LRO) problem and solves it through Progressive Backward Self-Intervention (PBSI), simplifying the overall procedure of DBIE while further enhancing quality by fully leveraging region-level structure information and progressive guidance from intermediate drag states. We validate extcolor{SkyBlue}{ extbf{DragNeXt}} on our NextBench, and extensive experiments demonstrate that our proposed method can significantly outperform existing approaches. Code will be released on github.