SpotEdit: Selective Region Editing in Diffusion Transformers

📅 2025-12-26

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing diffusion Transformer-based image editing methods apply uniform denoising across all image tokens, leading to redundant computation and distortion in unmodified regions. To address this, we propose a training-free local editing framework centered on a novel, training-agnostic selective region updating mechanism. Our SpotSelector identifies perceptually stable regions via similarity matching and skips denoising therein; SpotFusion enables context-aware, adaptive feature fusion; and conditional image feature reuse is combined with token-level sparse denoising. The method significantly reduces computational overhead—achieving an average 42% FLOPs reduction—while strictly preserving high fidelity in unedited areas (LPIPS improves by 0.18). It attains state-of-the-art performance across diverse local editing tasks, including mask-guided editing and object replacement.

Technology Category

Application Category

📝 Abstract

Diffusion Transformer models have significantly advanced image editing by encoding conditional images and integrating them into transformer layers. However, most edits involve modifying only small regions, while current methods uniformly process and denoise all tokens at every timestep, causing redundant computation and potentially degrading unchanged areas. This raises a fundamental question: Is it truly necessary to regenerate every region during editing? To address this, we propose SpotEdit, a training-free diffusion editing framework that selectively updates only the modified regions. SpotEdit comprises two key components: SpotSelector identifies stable regions via perceptual similarity and skips their computation by reusing conditional image features; SpotFusion adaptively blends these features with edited tokens through a dynamic fusion mechanism, preserving contextual coherence and editing quality. By reducing unnecessary computation and maintaining high fidelity in unmodified areas, SpotEdit achieves efficient and precise image editing.

Problem

Research questions and friction points this paper is trying to address.

Selectively updates only modified image regions

Reduces redundant computation in diffusion transformers

Preserves contextual coherence in unedited areas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selectively updates only modified image regions

Identifies stable areas via perceptual similarity

Dynamically blends features to preserve coherence

🔎 Similar Papers

No similar papers found.