ConceptSplit: Decoupled Multi-Concept Personalization of Diffusion Models via Token-wise Adaptation and Attention Disentanglement

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address concept entanglement in multi-concept personalization for text-to-image diffusion models, this paper proposes a fine-grained disentanglement framework that decouples training and inference. Our method comprises two key components: (1) Token-wise Value Adaptation (ToVA), which dynamically adapts value projections in cross-attention layers in a concept-aware manner; and (2) Latent Optimization for Disentangled Attention (LODA), which optimizes input latent variables to alleviate concept interference in attention maps. Neither component requires model merging or fine-tuning, enabling independent control over individual concepts during generation. Experiments demonstrate that our approach significantly suppresses unintended cross-concept interference across diverse multi-concept composition scenarios. Both qualitative and quantitative evaluations outperform existing state-of-the-art methods. This work establishes a new paradigm for robust and controllable multi-concept personalized image generation.

Technology Category

Application Category

📝 Abstract

In recent years, multi-concept personalization for text-to-image (T2I) diffusion models to represent several subjects in an image has gained much more attention. The main challenge of this task is "concept mixing", where multiple learned concepts interfere or blend undesirably in the output image. To address this issue, in this paper, we present ConceptSplit, a novel framework to split the individual concepts through training and inference. Our framework comprises two key components. First, we introduce Token-wise Value Adaptation (ToVA), a merging-free training method that focuses exclusively on adapting the value projection in cross-attention. Based on our empirical analysis, we found that modifying the key projection, a common approach in existing methods, can disrupt the attention mechanism and lead to concept mixing. Second, we propose Latent Optimization for Disentangled Attention (LODA), which alleviates attention entanglement during inference by optimizing the input latent. Through extensive qualitative and quantitative experiments, we demonstrate that ConceptSplit achieves robust multi-concept personalization, mitigating unintended concept interference. Code is available at https://github.com/KU-VGI/ConceptSplit

Problem

Research questions and friction points this paper is trying to address.

Addresses concept mixing in multi-concept text-to-image diffusion models

Prevents learned concepts from interfering during image generation

Enables robust personalization of multiple subjects in single images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-wise Value Adaptation for cross-attention

Latent Optimization for Disentangled Attention

Splits concepts through training and inference

🔎 Similar Papers

Low-Rank Continual Personalization of Diffusion Models