From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing training-free, text-guided image editing methods suffer from a lack of coordination between editing and reconstruction branches due to their adversarial paradigm, leading to semantic conflicts and unpredictable outcomes. This work proposes CoEdit, a novel framework that introduces a cooperative-competitive mechanism: it employs dual-entropy attention manipulation in the spatial domain to precisely localize editable regions and incorporates an entropy-based latent refinement mechanism in the temporal domain to dynamically optimize latent representations and suppress error accumulation. Additionally, a fidelity-constrained editing scoring scheme enhances structural consistency. Without requiring any additional training, CoEdit significantly improves editing quality, semantic fidelity, and text-image alignment on standard benchmarks, enabling more controllable and consistent zero-shot image editing.

Technology Category

Application Category

📝 Abstract

Text-guided image editing, a pivotal task in modern multimedia content creation, has seen remarkable progress with training-free methods that eliminate the need for additional optimization. Despite recent progress, existing methods are typically constrained by a competitive paradigm in which the editing and reconstruction branches are independently driven by their respective objectives to maximize alignment with target and source prompts. The adversarial strategy causes semantic conflicts and unpredictable outcomes due to the lack of coordination between branches. To overcome these issues, we propose Coopetitive Training-Free Image Editing (CoEdit), a novel zero-shot framework that transforms attention control from competition to coopetitive negotiation, achieving editing harmony across spatial and temporal dimensions. Spatially, CoEdit introduces Dual-Entropy Attention Manipulation, which quantifies directional entropic interactions between branches to reformulate attention control as a harmony-maximization problem, eventually improving the localization of editable and preservable regions. Temporally, we present Entropic Latent Refinement mechanism to dynamically adjust latent representations over time, minimizing accumulated editing errors and ensuring consistent semantic transitions throughout the denoising trajectory. Additionally, we propose the Fidelity-Constrained Editing Score, a composite metric that jointly evaluates semantic editing and background fidelity. Extensive experiments on standard benchmarks demonstrate that CoEdit achieves superior performance in both editing quality and structural preservation, enhancing multimedia information utilization by enabling more effective interaction between visual and textual modalities. The code will be available at https://github.com/JinhaoShen/CoEdit.

Problem

Research questions and friction points this paper is trying to address.

text-guided image editing

training-free

semantic conflict

attention control

coopetition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coopetitive Editing

Training-Free Image Editing

Dual-Entropy Attention