Shifting the Breaking Point of Flow Matching for Multi-Instance Editing

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of achieving independent, non-interfering simultaneous editing of multiple regions in multi-instance scenarios—a limitation of existing flow-matching-based image editing methods. To this end, the authors propose an instance-decoupled attention mechanism that explicitly binds textual instructions to their corresponding spatial regions within the flow-matching framework, enabling locally controllable multi-instance editing in a single forward pass. By circumventing the constraints imposed by global conditional velocity fields and joint attention mechanisms on edit disentanglement, the method achieves high-fidelity, spatially precise, and globally consistent edits. Its effectiveness is validated on both natural images and a newly introduced benchmark of text-dense infographics, demonstrating superior performance in complex, multi-object editing tasks.

Technology Category

Application Category

📝 Abstract
Flow matching models have recently emerged as an efficient alternative to diffusion, especially for text-guided image generation and editing, offering faster inference through continuous-time dynamics. However, existing flow-based editors predominantly support global or single-instruction edits and struggle with multi-instance scenarios, where multiple parts of a reference input must be edited independently without semantic interference. We identify this limitation as a consequence of globally conditioned velocity fields and joint attention mechanisms, which entangle concurrent edits. To address this issue, we introduce Instance-Disentangled Attention, a mechanism that partitions joint attention operations, enforcing binding between instance-specific textual instructions and spatial regions during velocity field estimation. We evaluate our approach on both natural image editing and a newly introduced benchmark of text-dense infographics with region-level editing instructions. Experimental results demonstrate that our approach promotes edit disentanglement and locality while preserving global output coherence, enabling single-pass, instance-level editing.
Problem

Research questions and friction points this paper is trying to address.

flow matching
multi-instance editing
image editing
semantic interference
instance disentanglement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching
Multi-Instance Editing
Instance-Disentangled Attention
Text-Guided Image Editing
Velocity Field
🔎 Similar Papers
No similar papers found.