Semantic Granularity Navigation in Image Editing

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work addresses the inherent tension in image editing between semantic editability and structural fidelity, which stems from the implicit coupling of editing progression and model scale—aggressive edits often disrupt layout due to high noise levels. To resolve this, the authors propose NaviEdit, a training-free inference-time controller that decouples editing progression from model scale by enforcing strict self-consistency constraints during rollout. It dynamically reallocates a fixed step budget to intermediate scales exhibiting stronger semantic responsiveness. NaviEdit is the first method to disentangle editing intensity from the denoising trajectory, introducing a transferable principle for inference control that enhances editing performance without modifying the pretrained model. Experiments demonstrate consistent and significant improvements across diverse editors and flow-model backbones, confirming its effectiveness and generalizability.
📝 Abstract
Despite the generative capabilities of diffusion and flow models, real-image editing remains constrained by a persistent trade-off between semantic editability and structural fidelity. We trace a primary cause of this limitation to the implicit coupling of edit progress with model scale in existing paradigms. Under this coupling, stronger edits typically require visiting noisier states, which spends computation on destabilizing layout before the semantic change is well localized. We introduce NaviEdit, a training-free inference-time controller that decouples edit progress from model scale traversal through a strict self-consistency contract. NaviEdit operates at the rollout level and leaves the underlying pretrained model unchanged. It treats scale as a control input and reallocates a fixed step budget toward semantically responsive intermediate scales instead of destructive high-noise regimes. Experiments show positive average gains across compatible editors and flow backbones, supporting decoupling as a portable inference-time control principle.
Problem

Research questions and friction points this paper is trying to address.

semantic editability
structural fidelity
image editing
diffusion models
flow models
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic granularity
scale decoupling
self-consistency
training-free editing
diffusion/flow control