CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing unified diffusion-based image editing models often suffer from task interference and modality conflicts when handling multiple editing conditions, leading to artifacts such as color bleeding, identity distortion, or style drift. To address this, this work proposes a lightweight latent attention router that dynamically allocates computational resources during the diffusion process to four specialized expert modules—text, mask, reference, and base—via a condition-aware routing mechanism. By integrating Mask Repaint, sparse Top-K expert selection, and Latent Mixture fusion, the method enables coherent integration of semantic, spatial, and stylistic information. This approach significantly mitigates multi-condition conflicts and enhances both accuracy and consistency across diverse editing tasks, including object removal, replacement, text-driven editing, and style transfer.

Technology Category

Application Category

📝 Abstract
Unified diffusion editors often rely on a fixed, shared backbone for diverse tasks, suffering from task interference and poor adaptation to heterogeneous demands (e.g., local vs global, semantic vs photometric). In particular, prevalent ControlNet and OmniControl variants combine multiple conditioning signals (e.g., text, mask, reference) via static concatenation or additive adapters which cannot dynamically prioritize or suppress conflicting modalities, thus resulting in artifacts like color bleeding across mask boundaries, identity or style drift, and unpredictable behavior under multi-condition inputs. To address this, we propose Condition-Aware Routing of Experts (CARE-Edit) that aligns model computation with specific editing competencies. At its core, a lightweight latent-attention router assigns encoded diffusion tokens to four specialized experts--Text, Mask, Reference, and Base--based on multi-modal conditions and diffusion timesteps: (i) a Mask Repaint module first refines coarse user-defined masks for precise spatial guidance; (ii) the router applies sparse top-K selection to dynamically allocate computation to the most relevant experts; (iii) a Latent Mixture module subsequently fuses expert outputs, coherently integrating semantic, spatial, and stylistic information to the base images. Experiments validate CARE-Edit's strong performance on contextual editing tasks, including erasure, replacement, text-driven edits, and style transfer. Empirical analysis further reveals task-specific behavior of specialized experts, showcasing the importance of dynamic, condition-aware processing to mitigate multi-condition conflicts.
Problem

Research questions and friction points this paper is trying to address.

diffusion editing
multi-condition conflict
task interference
heterogeneous editing demands
condition fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Condition-Aware Routing
Mixture of Experts
Diffusion-based Image Editing
Dynamic Modality Fusion
Contextual Editing