Shifting the Breaking Point of Flow Matching for Multi-Instance Editing

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge of achieving independent, non-interfering simultaneous editing of multiple regions in multi-instance scenarios—a limitation of existing flow-matching-based image editing methods. To this end, the authors propose an instance-decoupled attention mechanism that explicitly binds textual instructions to their corresponding spatial regions within the flow-matching framework, enabling locally controllable multi-instance editing in a single forward pass. By circumventing the constraints imposed by global conditional velocity fields and joint attention mechanisms on edit disentanglement, the method achieves high-fidelity, spatially precise, and globally consistent edits. Its effectiveness is validated on both natural images and a newly introduced benchmark of text-dense infographics, demonstrating superior performance in complex, multi-object editing tasks.

Technology Category

Application Category

📝 Abstract

Flow matching models have recently emerged as an efficient alternative to diffusion, especially for text-guided image generation and editing, offering faster inference through continuous-time dynamics. However, existing flow-based editors predominantly support global or single-instruction edits and struggle with multi-instance scenarios, where multiple parts of a reference input must be edited independently without semantic interference. We identify this limitation as a consequence of globally conditioned velocity fields and joint attention mechanisms, which entangle concurrent edits. To address this issue, we introduce Instance-Disentangled Attention, a mechanism that partitions joint attention operations, enforcing binding between instance-specific textual instructions and spatial regions during velocity field estimation. We evaluate our approach on both natural image editing and a newly introduced benchmark of text-dense infographics with region-level editing instructions. Experimental results demonstrate that our approach promotes edit disentanglement and locality while preserving global output coherence, enabling single-pass, instance-level editing.

Problem

Research questions and friction points this paper is trying to address.

flow matching

multi-instance editing

image editing

semantic interference

instance disentanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching

Multi-Instance Editing

Instance-Disentangled Attention