🤖 AI Summary
Existing drag-based image editing methods operate in the latent space of generative models, suffering from low spatial precision, high latency, and strong model dependency. This paper proposes a novel pixel-space-driven framework that decouples editing into two sequential stages: bidirectional geometric deformation and image inpainting. First, an elastic deformation model enables millisecond-level, pixel-accurate bidirectional preview (0.01 seconds for 512×512 images); second, a generic image inpainting model performs fine-grained reconstruction (0.3 seconds). A lightweight adapter—requiring no architectural modification to the inpainter—enables plug-and-play compatibility with arbitrary state-of-the-art inpainting models. Experiments demonstrate substantial improvements in interactive responsiveness and manipulation accuracy, yielding more natural edits with enhanced controllability and superior visual quality compared to current SOTA methods.
📝 Abstract
Drag-based image editing has emerged as a powerful paradigm for intuitive image manipulation. However, existing approaches predominantly rely on manipulating the latent space of generative models, leading to limited precision, delayed feedback, and model-specific constraints. Accordingly, we present Inpaint4Drag, a novel framework that decomposes drag-based editing into pixel-space bidirectional warping and image inpainting. Inspired by elastic object deformation in the physical world, we treat image regions as deformable materials that maintain natural shape under user manipulation. Our method achieves real-time warping previews (0.01s) and efficient inpainting (0.3s) at 512x512 resolution, significantly improving the interaction experience compared to existing methods that require minutes per edit. By transforming drag inputs directly into standard inpainting formats, our approach serves as a universal adapter for any inpainting model without architecture modification, automatically inheriting all future improvements in inpainting technology. Extensive experiments demonstrate that our method achieves superior visual quality and precise control while maintaining real-time performance. Project page: https://visual-ai.github.io/inpaint4drag/