TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

When composing multiple LoRA modules for multi-concept image generation, existing methods suffer from identity loss and feature leakage. We identify two root causes: (1) token-level interference among modules, and (2) spatial misalignment between rare tokens and their corresponding concept regions in attention maps. To address these issues, we propose Token-Aware LoRA (TARA), the first method to introduce token masking constraints and attention spatial alignment—enabling plug-and-play composition of pre-trained LoRA modules without additional fine-tuning. TARA enhances each module’s focus on concept-specific rare tokens, thereby improving identity fidelity and suppressing cross-module interference. Experiments demonstrate that TARA significantly boosts visual consistency and concept controllability while preserving generation quality. Our approach establishes an efficient, scalable paradigm for modular personalization in diffusion models.

Technology Category

Application Category

📝 Abstract

Personalized text-to-image generation aims to synthesize novel images of a specific subject or style using only a few reference images. Recent methods based on Low-Rank Adaptation (LoRA) enable efficient single-concept customization by injecting lightweight, concept-specific adapters into pre-trained diffusion models. However, combining multiple LoRA modules for multi-concept generation often leads to identity missing and visual feature leakage. In this work, we identify two key issues behind these failures: (1) token-wise interference among different LoRA modules, and (2) spatial misalignment between the attention map of a rare token and its corresponding concept-specific region. To address these issues, we propose Token-Aware LoRA (TARA), which introduces a token mask to explicitly constrain each module to focus on its associated rare token to avoid interference, and a training objective that encourages the spatial attention of a rare token to align with its concept region. Our method enables training-free multi-concept composition by directly injecting multiple independently trained TARA modules at inference time. Experimental results demonstrate that TARA enables efficient multi-concept inference and effectively preserving the visual identity of each concept by avoiding mutual interference between LoRA modules. The code and models are available at https://github.com/YuqiPeng77/TARA.

Problem

Research questions and friction points this paper is trying to address.

Addresses token-wise interference in multi-concept LoRA modules

Solves spatial misalignment between rare tokens and concept regions

Enables training-free multi-concept composition without identity loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token mask prevents interference between LoRA modules

Training objective aligns rare token attention with concept regions

Enables training-free multi-concept composition at inference

🔎 Similar Papers

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models