DirectDrag: High-Fidelity, Mask-Free, Prompt-Free Drag-based Image Editing via Readout-Guided Feature Alignment

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing drag-based image editing methods heavily rely on manually annotated masks and text prompts; otherwise, they often produce visual artifacts or spatial misalignments. To address this, we propose the first mask-free and prompt-free point-driven drag editing framework. Our method automatically infers a soft mask from user-specified displacement points to localize editable regions, introduces a readout-guided feature alignment mechanism that leverages differentiable activations from intermediate diffusion model layers to preserve structural consistency, and incorporates point-guided deformation propagation with dynamic soft mask optimization. Evaluated on DragBench and real-world scenarios, our approach achieves state-of-the-art performance in image fidelity, drag accuracy, and interaction efficiency. It is the first method to enable truly end-to-end, prompt-agnostic, high-precision geometrically controllable image editing.

Technology Category

Application Category

📝 Abstract
Drag-based image editing using generative models provides intuitive control over image structures. However, existing methods rely heavily on manually provided masks and textual prompts to preserve semantic fidelity and motion precision. Removing these constraints creates a fundamental trade-off: visual artifacts without masks and poor spatial control without prompts. To address these limitations, we propose DirectDrag, a novel mask- and prompt-free editing framework. DirectDrag enables precise and efficient manipulation with minimal user input while maintaining high image fidelity and accurate point alignment. DirectDrag introduces two key innovations. First, we design an Auto Soft Mask Generation module that intelligently infers editable regions from point displacement, automatically localizing deformation along movement paths while preserving contextual integrity through the generative model's inherent capacity. Second, we develop a Readout-Guided Feature Alignment mechanism that leverages intermediate diffusion activations to maintain structural consistency during point-based edits, substantially improving visual fidelity. Despite operating without manual mask or prompt, DirectDrag achieves superior image quality compared to existing methods while maintaining competitive drag accuracy. Extensive experiments on DragBench and real-world scenarios demonstrate the effectiveness and practicality of DirectDrag for high-quality, interactive image manipulation. Project Page: https://frakw.github.io/DirectDrag/. Code is available at: https://github.com/frakw/DirectDrag.
Problem

Research questions and friction points this paper is trying to address.

Eliminates need for manual masks and text prompts in drag-based image editing
Addresses visual artifacts from mask-free editing and poor spatial control without prompts
Enables precise point-based manipulation while maintaining high image fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto Soft Mask Generation for editable region inference
Readout-Guided Feature Alignment using diffusion activations
Mask- and prompt-free framework for high-fidelity editing
🔎 Similar Papers
No similar papers found.
S
Sheng-Hao Liao
National Taiwan University of Science and Technology
Shang-Fu Chen
Shang-Fu Chen
National Taiwan University
T
Tai-Ming Huang
National Taiwan University
Wen-Huang Cheng
Wen-Huang Cheng
Professor, IEEE Fellow, National Taiwan University
Artificial IntelligenceMultimediaComputer VisionMachine Learning
K
Kai-Lung Hua
National Taiwan University of Science and Technology, Microsoft Taiwan