🤖 AI Summary
This work addresses the challenge of modular, multi-concept customization in diffusion models. To enable zero-shot, on-the-fly fusion of heterogeneous concepts—including persons, objects, scenes, and artistic styles—we propose a novel framework that mitigates inter-concept interference and identity ambiguity. Methodologically, we introduce Randomized Output Erasure (ROE), a first-of-its-kind mechanism to suppress spurious cross-concept activations during generation, and design Blockwise LoRA Parameterization to preserve identity fidelity during parameter merging. Crucially, our approach requires no additional training. It achieves high-fidelity composition of up to 15 distinct concepts, outperforming state-of-the-art methods on concept stylization and multi-concept customization benchmarks. To our knowledge, this is the first method to realize large-scale, lossless, plug-and-play modular customization in diffusion models.
📝 Abstract
Recent diffusion model customization has shown impressive results in incorporating subject or style concepts with a handful of images. However, the modular composition of multiple concepts into a customized model, aimed to efficiently merge decentralized-trained concepts without influencing their identities, remains unresolved. Modular customization is essential for applications like concept stylization and multi-concept customization using concepts trained by different users. Existing post-training methods are only confined to a fixed set of concepts, and any different combinations require a new round of retraining. In contrast, instant merging methods often cause identity loss and interference of individual merged concepts and are usually limited to a small number of concepts. To address these issues, we propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity. With a careful analysis of the underlying reason for interference, we develop the Randomized Output Erasure technique to minimize the interference of different customized models. Additionally, Blockwise LoRA Parameterization is proposed to reduce the identity loss during instant model merging. Extensive experiments validate the effectiveness of BlockLoRA, which can instantly merge 15 concepts of people, subjects, scenes, and styles with high fidelity.