Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing image editing methods in accurately modeling geometric transformations—such as translation, rotation, and scaling—and in rendering complex lighting and shadow effects with photorealism. To overcome these challenges, we propose GeoEdit, a framework that leverages a diffusion Transformer for context-aware image inpainting while explicitly integrating geometric instructions into the generative process. Additionally, we introduce an effect-sensitive attention mechanism to substantially enhance illumination and shadow consistency. To support this research, we construct RS-Objects, a high-quality dataset comprising 120,000 image pairs. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods across multiple public benchmarks in terms of visual fidelity, geometric accuracy, and photorealistic lighting coherence.

Technology Category

Application Category

📝 Abstract
Recent advances in diffusion models have significantly improved image editing. However, challenges persist in handling geometric transformations, such as translation, rotation, and scaling, particularly in complex scenes. Existing approaches suffer from two main limitations: (1) difficulty in achieving accurate geometric editing of object translation, rotation, and scaling; (2) inadequate modeling of intricate lighting and shadow effects, leading to unrealistic results. To address these issues, we propose GeoEdit, a framework that leverages in-context generation through a diffusion transformer module, which integrates geometric transformations for precise object edits. Moreover, we introduce Effects-Sensitive Attention, which enhances the modeling of intricate lighting and shadow effects for improved realism. To further support training, we construct RS-Objects, a large-scale geometric editing dataset containing over 120,000 high-quality image pairs, enabling the model to learn precise geometric editing while generating realistic lighting and shadows. Extensive experiments on public benchmarks demonstrate that GeoEdit consistently outperforms state-of-the-art methods in terms of visual quality, geometric accuracy, and realism.
Problem

Research questions and friction points this paper is trying to address.

geometric editing
image inpainting
lighting effects
shadow modeling
diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion transformer
geometric editing
in-context inpainting
effects-sensitive attention
RS-Objects dataset
🔎 Similar Papers
No similar papers found.
S
Shuo Zhang
PRIS, Beijing University of Posts and Telecommunications
W
Wenzhuo Wu
PRIS, Beijing University of Posts and Telecommunications
Huayu Zhang
Huayu Zhang
Senior Engineer, Huawei Technologies Co., Ltd
Distributed SystemNetwork ScienceMachine LearningOptimizationGraph Theory
J
Jiarong Cheng
Institute of Artificial Intelligence (TeleAI), China Telecom; Beijing Institute of Technology
X
Xianghao Zang
Institute of Artificial Intelligence (TeleAI), China Telecom
C
Chao Ban
Institute of Artificial Intelligence (TeleAI), China Telecom
H
Hao Sun
Institute of Artificial Intelligence (TeleAI), China Telecom
Z
Zhongjiang He
Institute of Artificial Intelligence (TeleAI), China Telecom
T
Tianwei Cao
PRIS, Beijing University of Posts and Telecommunications
Kongming Liang
Kongming Liang
Beijing University of Posts and Telecommunications
Computer VisionPattern RecognitionMachine Learning
Zhanyu Ma
Zhanyu Ma
Beijing University of Posts and Telecommunications
Pattern RecognitionMachine LearningComputer VisionMultimedia TechnologyDeep Learning