FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image editing models struggle to preserve character and object consistency across iterative editing rounds. To address this, we propose the first flow-matching-based unified generative-editing framework that supports multimodal (text + image) contextual inputs, enabling integrated local editing, global redrawing, character/style referencing, and text-driven editing. Our approach innovatively introduces flow matching to contextual image editing, incorporating latent-space modeling, semantic context sequence concatenation, and a multi-task unified architecture, while replacing conventional sampling with efficient implicit diffusion. We introduce KontextBench, the first benchmark covering five editing task categories (1,026 samples). Experiments demonstrate that our method surpasses state-of-the-art approaches in both single-round editing fidelity and multi-round consistency, with significantly accelerated inference—enabling real-time interactive editing and rapid prototyping.

Technology Category

Application Category

📝 Abstract
We present evaluation results for FLUX.1 Kontext, a generative flow matching model that unifies image generation and editing. The model generates novel output views by incorporating semantic context from text and image inputs. Using a simple sequence concatenation approach, FLUX.1 Kontext handles both local editing and generative in-context tasks within a single unified architecture. Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, we observe that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows.The model achieves competitive performance with current state-of-the-art systems while delivering significantly faster generation times, enabling interactive applications and rapid prototyping workflows. To validate these improvements, we introduce KontextBench, a comprehensive benchmark with 1026 image-prompt pairs covering five task categories: local editing, global editing, character reference, style reference and text editing. Detailed evaluations show the superior performance of FLUX.1 Kontext in terms of both single-turn quality and multi-turn consistency, setting new standards for unified image processing models.
Problem

Research questions and friction points this paper is trying to address.

Unifies image generation and editing in one model
Improves character consistency in iterative workflows
Achieves fast generation for interactive applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative flow matching for unified image tasks
Sequence concatenation handles local and global editing
Faster generation enabling interactive applications
🔎 Similar Papers
No similar papers found.
B
Black Forest Labs
S
Stephen Batifol
Andreas Blattmann
Andreas Blattmann
F
Frederic Boesel
S
Saksham Consul
C
Cyril Diagne
Tim Dockhorn
Tim Dockhorn
J
Jack English
Z
Zion English
Patrick Esser
Patrick Esser
Unknown affiliation
Generative Models
Sumith Kulal
Sumith Kulal
Stanford University
Artificial IntelligenceDeep LearningComputer VisionGenerative Models
K
Kyle Lacey
Y
Yam Levi
C
Cheng Li
D
Dominik Lorenz
J
Jonas Muller
D
Dustin Podell
Robin Rombach
Robin Rombach
Black Forest Labs
Deep LearningGenerative ModelsImage SynthesisVideo GenerationDistillation
H
Harry Saini
Axel Sauer
Axel Sauer
Black Forest Labs
Generative ModelsDeep LearningComputer Vision
L
Luke Smith