🤖 AI Summary
When composing multiple LoRA modules for multi-concept image generation, existing methods suffer from identity loss and feature leakage. We identify two root causes: (1) token-level interference among modules, and (2) spatial misalignment between rare tokens and their corresponding concept regions in attention maps. To address these issues, we propose Token-Aware LoRA (TARA), the first method to introduce token masking constraints and attention spatial alignment—enabling plug-and-play composition of pre-trained LoRA modules without additional fine-tuning. TARA enhances each module’s focus on concept-specific rare tokens, thereby improving identity fidelity and suppressing cross-module interference. Experiments demonstrate that TARA significantly boosts visual consistency and concept controllability while preserving generation quality. Our approach establishes an efficient, scalable paradigm for modular personalization in diffusion models.
📝 Abstract
Personalized text-to-image generation aims to synthesize novel images of a specific subject or style using only a few reference images. Recent methods based on Low-Rank Adaptation (LoRA) enable efficient single-concept customization by injecting lightweight, concept-specific adapters into pre-trained diffusion models. However, combining multiple LoRA modules for multi-concept generation often leads to identity missing and visual feature leakage. In this work, we identify two key issues behind these failures: (1) token-wise interference among different LoRA modules, and (2) spatial misalignment between the attention map of a rare token and its corresponding concept-specific region. To address these issues, we propose Token-Aware LoRA (TARA), which introduces a token mask to explicitly constrain each module to focus on its associated rare token to avoid interference, and a training objective that encourages the spatial attention of a rare token to align with its concept region. Our method enables training-free multi-concept composition by directly injecting multiple independently trained TARA modules at inference time. Experimental results demonstrate that TARA enables efficient multi-concept inference and effectively preserving the visual identity of each concept by avoiding mutual interference between LoRA modules. The code and models are available at https://github.com/YuqiPeng77/TARA.