FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing image editing models struggle to preserve character and object consistency across iterative editing rounds. To address this, we propose the first flow-matching-based unified generative-editing framework that supports multimodal (text + image) contextual inputs, enabling integrated local editing, global redrawing, character/style referencing, and text-driven editing. Our approach innovatively introduces flow matching to contextual image editing, incorporating latent-space modeling, semantic context sequence concatenation, and a multi-task unified architecture, while replacing conventional sampling with efficient implicit diffusion. We introduce KontextBench, the first benchmark covering five editing task categories (1,026 samples). Experiments demonstrate that our method surpasses state-of-the-art approaches in both single-round editing fidelity and multi-round consistency, with significantly accelerated inference—enabling real-time interactive editing and rapid prototyping.

Technology Category

Application Category

📝 Abstract

We present evaluation results for FLUX.1 Kontext, a generative flow matching model that unifies image generation and editing. The model generates novel output views by incorporating semantic context from text and image inputs. Using a simple sequence concatenation approach, FLUX.1 Kontext handles both local editing and generative in-context tasks within a single unified architecture. Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, we observe that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows.The model achieves competitive performance with current state-of-the-art systems while delivering significantly faster generation times, enabling interactive applications and rapid prototyping workflows. To validate these improvements, we introduce KontextBench, a comprehensive benchmark with 1026 image-prompt pairs covering five task categories: local editing, global editing, character reference, style reference and text editing. Detailed evaluations show the superior performance of FLUX.1 Kontext in terms of both single-turn quality and multi-turn consistency, setting new standards for unified image processing models.

Problem

Research questions and friction points this paper is trying to address.

Unifies image generation and editing in one model

Improves character consistency in iterative workflows

Achieves fast generation for interactive applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative flow matching for unified image tasks

Sequence concatenation handles local and global editing

Faster generation enabling interactive applications

🔎 Similar Papers

No similar papers found.