PairEdit: Learning Semantic Variations for Exemplar-based Image Editing

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image editing methods heavily rely on textual prompts to describe editing semantics, making precise characterization of complex visual transformations challenging. This paper introduces the first purely image-driven few-shot (even single-pair) semantic editing framework, requiring no textual guidance. Our method leverages diffusion models to achieve high-fidelity editing. Key contributions include: (1) a target noise prediction mechanism that explicitly models the semantic transformation direction between image pairs; (2) a content-preserving adaptive noise scheduling strategy to enhance structural consistency; and (3) a dual-LoRA decoupled training scheme that separately learns editing style and content representations. Extensive qualitative and quantitative evaluations demonstrate state-of-the-art performance, with significantly improved content consistency over existing baselines.

Technology Category

Application Category

📝 Abstract
Recent advancements in text-guided image editing have achieved notable success by leveraging natural language prompts for fine-grained semantic control. However, certain editing semantics are challenging to specify precisely using textual descriptions alone. A practical alternative involves learning editing semantics from paired source-target examples. Existing exemplar-based editing methods still rely on text prompts describing the change within paired examples or learning implicit text-based editing instructions. In this paper, we introduce PairEdit, a novel visual editing method designed to effectively learn complex editing semantics from a limited number of image pairs or even a single image pair, without using any textual guidance. We propose a target noise prediction that explicitly models semantic variations within paired images through a guidance direction term. Moreover, we introduce a content-preserving noise schedule to facilitate more effective semantic learning. We also propose optimizing distinct LoRAs to disentangle the learning of semantic variations from content. Extensive qualitative and quantitative evaluations demonstrate that PairEdit successfully learns intricate semantics while significantly improving content consistency compared to baseline methods. Code will be available at https://github.com/xudonmao/PairEdit.
Problem

Research questions and friction points this paper is trying to address.

Learning editing semantics from image pairs without text
Modeling semantic variations in paired images explicitly
Disentangling semantic learning from content for better consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns editing semantics from image pairs
Uses target noise prediction for variations
Optimizes LoRAs to disentangle semantic learning
🔎 Similar Papers
H
Haoguang Lu
Sun Yat-sen University
J
Jiacheng Chen
Sun Yat-sen University
Z
Zhenguo Yang
Guangdong University of Technology
A
Aurele Tohokantche Gnanha
Huawei Noah’s Ark Laboratory
Fu Lee Wang
Fu Lee Wang
Hong Kong Metropolitan University
AIData ScienceLearning Technology
L
Li Qing
The Hong Kong Polytechnic University
Xudong Mao
Xudong Mao
Sun Yat-sen University
Computer VisionDeep Learning