AttentionDrag: Exploiting Latent Correlation Knowledge in Pre-trained Diffusion Models for Image Editing

๐Ÿ“… 2025-06-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional point-driven image editing methods rely on iterative optimization or geometric transformations, suffering from low efficiency and difficulty in modeling semantic correlations, thus failing to fully exploit the editing potential of pre-trained diffusion models. This paper proposes the first single-step, point-driven editing framework that requires neither fine-tuning nor iteration. It directly mines cross-regional semantic correlations implicitly encoded in the U-Net self-attention mechanism during DDIM inversion, enabling adaptive generation of context-aware masks for precise, semantically consistent interactive editing. Its core innovation lies in the first explicit use of diffusion model self-attention as a semantic prior for point-guided editing. Experiments demonstrate that our method achieves state-of-the-art performance in both semantic consistency and localization accuracy across multiple benchmarks, while maintaining high visual fidelity and significantly outperforming mainstream approaches in editing speedโ€”enabling real-time interaction.

Technology Category

Application Category

๐Ÿ“ Abstract
Traditional point-based image editing methods rely on iterative latent optimization or geometric transformations, which are either inefficient in their processing or fail to capture the semantic relationships within the image. These methods often overlook the powerful yet underutilized image editing capabilities inherent in pre-trained diffusion models. In this work, we propose a novel one-step point-based image editing method, named AttentionDrag, which leverages the inherent latent knowledge and feature correlations within pre-trained diffusion models for image editing tasks. This framework enables semantic consistency and high-quality manipulation without the need for extensive re-optimization or retraining. Specifically, we reutilize the latent correlations knowledge learned by the self-attention mechanism in the U-Net module during the DDIM inversion process to automatically identify and adjust relevant image regions, ensuring semantic validity and consistency. Additionally, AttentionDrag adaptively generates masks to guide the editing process, enabling precise and context-aware modifications with friendly interaction. Our results demonstrate a performance that surpasses most state-of-the-art methods with significantly faster speeds, showing a more efficient and semantically coherent solution for point-based image editing tasks.
Problem

Research questions and friction points this paper is trying to address.

Exploiting latent knowledge in diffusion models for editing
Improving semantic consistency in point-based image editing
Enhancing efficiency and precision in image manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages latent knowledge in diffusion models
Uses self-attention for semantic consistency
Adaptively generates masks for precise editing
๐Ÿ”Ž Similar Papers
No similar papers found.
Biao Yang
Biao Yang
Shanghai Jiao Tong University, Antai College of Economics and Management
Asset PricingClimate Finance
M
Muqi Huang
Rajax Network Technology (ele.me), Alibaba Group
Yuhui Zhang
Yuhui Zhang
Stanford University
Machine LearningComputer VisionNatural Language ProcessingBiotech
Y
Yun Xiong
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
K
Kun Zhou
Rajax Network Technology (ele.me), Alibaba Group
X
Xi Chen
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
S
Shiyang Zhou
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
H
Huishuai Bao
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
C
Chuan Li
Rajax Network Technology (ele.me), Alibaba Group
F
Feng Shi
Rajax Network Technology (ele.me), Alibaba Group
H
Hualei Liu
Rajax Network Technology (ele.me), Alibaba Group