RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing single-reference image editing methods struggle to model and transfer non-rigid, content-aware visual relationships. To address this, we propose the first few-shot visual relation editing framework designed for editing intent generalization. First, we introduce RelationAdapter—a lightweight module that, for the first time, integrates explicit relational modeling into the Diffusion Transformer (DiT) architecture, enabling context-aware extraction and transfer of editing intents. Second, we construct Relation252K, a large-scale benchmark encompassing 218 distinct relational editing tasks, thereby filling a critical gap in evaluation resources for this domain. Extensive experiments demonstrate that our method significantly outperforms single-reference baselines in editing accuracy, generation quality, and cross-image intent transfer capability. This work establishes a new paradigm for visual editing—shifting the focus from superficial appearance adjustment toward deep semantic relationship understanding.

Technology Category

Application Category

📝 Abstract
Inspired by the in-context learning mechanism of large language models (LLMs), a new paradigm of generalizable visual prompt-based image editing is emerging. Existing single-reference methods typically focus on style or appearance adjustments and struggle with non-rigid transformations. To address these limitations, we propose leveraging source-target image pairs to extract and transfer content-aware editing intent to novel query images. To this end, we introduce RelationAdapter, a lightweight module that enables Diffusion Transformer (DiT) based models to effectively capture and apply visual transformations from minimal examples. We also introduce Relation252K, a comprehensive dataset comprising 218 diverse editing tasks, to evaluate model generalization and adaptability in visual prompt-driven scenarios. Experiments on Relation252K show that RelationAdapter significantly improves the model's ability to understand and transfer editing intent, leading to notable gains in generation quality and overall editing performance.
Problem

Research questions and friction points this paper is trying to address.

Enabling non-rigid image transformations via visual prompts
Transferring editing intent from source-target pairs to new images
Improving generalization in visual prompt-driven editing tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages source-target pairs for content-aware editing
Introduces RelationAdapter for DiT-based transformation learning
Uses Relation252K dataset for evaluating model generalization
🔎 Similar Papers
No similar papers found.
Y
Yan Gong
Zhe Jiang University
Yiren Song
Yiren Song
PH.D student, National University of Singapore
Generative AIDiffusionUnified model
Yicheng Li
Yicheng Li
Zhejiang University
computer science
C
Chenglin Li
Zhe Jiang University
Y
Yin Zhang
Zhe Jiang University