CannyEdit: Selective Canny Control and Dual-Prompt Guidance for Training-Free Image Editing

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing training-free region-based editing methods for text-to-image (T2I) models struggle to simultaneously ensure text fidelity within the edited region, contextual preservation in unedited regions, and natural boundary blending. This paper proposes a fine-tuning-free local editing framework that preserves structural integrity via selective Canny edge masking, jointly regulates generation using localized and global textual prompts, and injects region-specific features during ControlNet inference. Without modifying the pre-trained T2I model, the method enables precise text-driven editing while maintaining source-image detail consistency. Quantitative evaluation shows a 2.93–10.49% improvement in combined text alignment and contextual consistency metrics. A user study reveals that 52.7% of non-expert participants failed to detect editing artifacts, demonstrating significant gains in visual realism and practical usability.

Technology Category

Application Category

📝 Abstract
Recent advances in text-to-image (T2I) models have enabled training-free regional image editing by leveraging the generative priors of foundation models. However, existing methods struggle to balance text adherence in edited regions, context fidelity in unedited areas, and seamless integration of edits. We introduce CannyEdit, a novel training-free framework that addresses these challenges through two key innovations: (1) Selective Canny Control, which masks the structural guidance of Canny ControlNet in user-specified editable regions while strictly preserving details of the source images in unedited areas via inversion-phase ControlNet information retention. This enables precise, text-driven edits without compromising contextual integrity. (2) Dual-Prompt Guidance, which combines local prompts for object-specific edits with a global target prompt to maintain coherent scene interactions. On real-world image editing tasks (addition, replacement, removal), CannyEdit outperforms prior methods like KV-Edit, achieving a 2.93 to 10.49 percent improvement in the balance of text adherence and context fidelity. In terms of editing seamlessness, user studies reveal only 49.2 percent of general users and 42.0 percent of AIGC experts identified CannyEdit's results as AI-edited when paired with real images without edits, versus 76.08 to 89.09 percent for competitor methods.
Problem

Research questions and friction points this paper is trying to address.

Balancing text adherence in edited regions with context fidelity in unedited areas
Ensuring seamless integration of edits without compromising contextual integrity
Improving precision and coherence in training-free regional image editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Canny Control for precise text-driven edits
Dual-Prompt Guidance for coherent scene interactions
Training-free framework with inversion-phase ControlNet retention
🔎 Similar Papers
No similar papers found.
Weiyan Xie
Weiyan Xie
HKUST
Artificial Intelligence
H
Han Gao
Huawei Hong Kong AI Framework & Data Technologies Lab, The Hong Kong University of Science and Technology
Didan Deng
Didan Deng
The Hong Kong University of Science and Technology
Deep Learning
Kaican Li
Kaican Li
The Hong Kong University of Science and Technology
A
April Hua Liu
Shanghai University of Finance and Economics
Y
Yongxiang Huang
Huawei Hong Kong AI Framework & Data Technologies Lab
Nevin L. Zhang
Nevin L. Zhang
The Hong Kong University of Science and Technology
AIMachine LearningTraditional Chinese Medicine