🤖 AI Summary
Existing personalized diffusion models struggle to simultaneously achieve precise spatial layout control and faithful preservation of multiple subjects’ identities. This paper proposes a fine-tuning-free framework for multi-subject personalized image generation, introducing two key innovations: (1) a dynamic-static complementary visual refinement module that captures temporal or pose variations in reference images via dynamic feature modeling, while distilling static identity cues and leveraging cross-attention guidance; and (2) a two-stage layout control mechanism imposing explicit spatial constraints during both training and inference. Together, these enable pixel-level layout controllability and consistent cross-subject identity preservation. Extensive evaluations across multiple benchmarks demonstrate significant improvements: +23.6% in ID-Retrieval accuracy and +31.4% in IoU. The framework supports arbitrary subject identity specification and customizable spatial layouts even in complex scenes.
📝 Abstract
Diffusion models have significantly advanced text-to-image generation, laying the foundation for the development of personalized generative frameworks. However, existing methods lack precise layout controllability and overlook the potential of dynamic features of reference subjects in improving fidelity. In this work, we propose Layout-Controllable Personalized Diffusion (LCP-Diffusion) model, a novel framework that integrates subject identity preservation with flexible layout guidance in a tuning-free approach. Our model employs a Dynamic-Static Complementary Visual Refining module to comprehensively capture the intricate details of reference subjects, and introduces a Dual Layout Control mechanism to enforce robust spatial control across both training and inference stages. Extensive experiments validate that LCP-Diffusion excels in both identity preservation and layout controllability. To the best of our knowledge, this is a pioneering work enabling users to"create anything anywhere".