🤖 AI Summary
Existing pretrained flow-based generative models lack the ability to directly disentangle and customize underlying concepts from a single real image. This work reveals the three-stage dynamic nature of their generation process and introduces a stage-aware optimization strategy that learns semantic offsets via differential probing during the instantiation phase. Furthermore, the authors propose ConceptWeaver Guidance, a mechanism enabling precise injection and manipulation of concepts. Their approach achieves natural disentanglement for the first time during instantiation, establishing a stage-aligned conceptual editing framework that supports high-fidelity, composable, multi-granularity image synthesis and editing.
📝 Abstract
Pre-trained flow-based models excel at synthesizing complex scenes yet lack a direct mechanism for disentangling and customizing their underlying concepts from one-shot real-world sources. To demystify this process, we first introduce a novel differential probing technique to isolate and analyze the influence of individual concept tokens on the velocity field over time. This investigation yields a critical insight: the generative process is not monolithic but unfolds in three distinct stages. An initial \textbf{Blueprint Stage} establishes low-frequency structure, followed by a pivotal \textbf{Instantiation Stage} where content concepts emerge with peak intensity and become naturally disentangled, creating an optimal window for manipulation. A final concept-insensitive refinement stage then synthesizes fine-grained details. Guided by this discovery, we propose \textbf{ConceptWeaver}, a framework for one-shot concept disentanglement. ConceptWeaver learns concept-specific semantic offsets from a single reference image using a stage-aware optimization strategy that aligns with the three-stage framework. These learned offsets are then deployed during inference via our novel ConceptWeaver Guidance (CWG) mechanism, which strategically injects them at the appropriate generative stage. Extensive experiments validate that ConceptWeaver enables high-fidelity, compositional synthesis and editing, demonstrating that understanding and leveraging the intrinsic, staged nature of flow models is key to unlocking precise, multi-granularity content manipulation.