TARA: Token-Aware LoRA for Composable Personalization in Diffusion Models

📅 2025-08-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
When composing multiple LoRA modules for multi-concept image generation, existing methods suffer from identity loss and feature leakage. We identify two root causes: (1) token-level interference among modules, and (2) spatial misalignment between rare tokens and their corresponding concept regions in attention maps. To address these issues, we propose Token-Aware LoRA (TARA), the first method to introduce token masking constraints and attention spatial alignment—enabling plug-and-play composition of pre-trained LoRA modules without additional fine-tuning. TARA enhances each module’s focus on concept-specific rare tokens, thereby improving identity fidelity and suppressing cross-module interference. Experiments demonstrate that TARA significantly boosts visual consistency and concept controllability while preserving generation quality. Our approach establishes an efficient, scalable paradigm for modular personalization in diffusion models.

Technology Category

Application Category

📝 Abstract
Personalized text-to-image generation aims to synthesize novel images of a specific subject or style using only a few reference images. Recent methods based on Low-Rank Adaptation (LoRA) enable efficient single-concept customization by injecting lightweight, concept-specific adapters into pre-trained diffusion models. However, combining multiple LoRA modules for multi-concept generation often leads to identity missing and visual feature leakage. In this work, we identify two key issues behind these failures: (1) token-wise interference among different LoRA modules, and (2) spatial misalignment between the attention map of a rare token and its corresponding concept-specific region. To address these issues, we propose Token-Aware LoRA (TARA), which introduces a token mask to explicitly constrain each module to focus on its associated rare token to avoid interference, and a training objective that encourages the spatial attention of a rare token to align with its concept region. Our method enables training-free multi-concept composition by directly injecting multiple independently trained TARA modules at inference time. Experimental results demonstrate that TARA enables efficient multi-concept inference and effectively preserving the visual identity of each concept by avoiding mutual interference between LoRA modules. The code and models are available at https://github.com/YuqiPeng77/TARA.
Problem

Research questions and friction points this paper is trying to address.

Addresses token-wise interference in multi-concept LoRA modules
Solves spatial misalignment between rare tokens and concept regions
Enables training-free multi-concept composition without identity loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token mask prevents interference between LoRA modules
Training objective aligns rare token attention with concept regions
Enables training-free multi-concept composition at inference
🔎 Similar Papers
No similar papers found.
Yuqi Peng
Yuqi Peng
Master student, Northeastern University
Machine LearningDeep LearningVLMs
L
Lingtao Zheng
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Y
Yufeng Yang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Y
Yi Huang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Mingfu Yan
Mingfu Yan
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
AIGC
Jianzhuang Liu
Jianzhuang Liu
Shenzhen Institutes of Advanced Technology, University of Chinese Academy of Sciences
Computer VisionImage ProcessingAIGCMachine Learning
S
Shifeng Chen
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences; Shenzhen University of Advanced Technology