ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address concept entanglement in multi-concept personalization for text-to-image diffusion models, this paper proposes a fine-grained disentanglement framework that decouples training and inference. Our method comprises two key components: (1) Token-wise Value Adaptation (ToVA), which dynamically adapts value projections in cross-attention layers in a concept-aware manner; and (2) Latent Optimization for Disentangled Attention (LODA), which optimizes input latent variables to alleviate concept interference in attention maps. Neither component requires model merging or fine-tuning, enabling independent control over individual concepts during generation. Experiments demonstrate that our approach significantly suppresses unintended cross-concept interference across diverse multi-concept composition scenarios. Both qualitative and quantitative evaluations outperform existing state-of-the-art methods. This work establishes a new paradigm for robust and controllable multi-concept personalized image generation.

Technology Category

Application Category

📝 Abstract
In recent years, multi-concept personalization for text-to-image (T2I) diffusion models to represent several subjects in an image has gained much more attention. The main challenge of this task is "concept mixing", where multiple learned concepts interfere or blend undesirably in the output image. To address this issue, in this paper, we present ConceptSplit, a novel framework to split the individual concepts through training and inference. Our framework comprises two key components. First, we introduce Token-wise Value Adaptation (ToVA), a merging-free training method that focuses exclusively on adapting the value projection in cross-attention. Based on our empirical analysis, we found that modifying the key projection, a common approach in existing methods, can disrupt the attention mechanism and lead to concept mixing. Second, we propose Latent Optimization for Disentangled Attention (LODA), which alleviates attention entanglement during inference by optimizing the input latent. Through extensive qualitative and quantitative experiments, we demonstrate that ConceptSplit achieves robust multi-concept personalization, mitigating unintended concept interference. Code is available at https://github.com/KU-VGI/ConceptSplit
Problem

Research questions and friction points this paper is trying to address.

Addresses concept mixing in multi-concept text-to-image diffusion models
Prevents learned concepts from interfering during image generation
Enables robust personalization of multiple subjects in single images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-wise Value Adaptation for cross-attention
Latent Optimization for Disentangled Attention
Splits concepts through training and inference
🔎 Similar Papers
No similar papers found.
H
Habin Lim
Korea University
Y
Yeongseob Won
Kyung Hee University
J
Juwon Seo
Kyung Hee University
Gyeong-Moon Park
Gyeong-Moon Park
Assistant Professor at the Department of AI, Korea University
AIDeep LearningRobotics