MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-subject customization faces two key challenges: scarcity of multi-subject training data and entanglement of cross-subject attributes. To address these, we propose a novel method enabling robust multi-subject generation from only a single subject’s images. Our approach introduces (1) a bias-mitigating dual-branch learning framework that disentangles shared subject representations from identity-specific features, and (2) a dynamic attention-based routing mechanism—replacing static routing—to achieve fine-grained subject–attribute alignment. Built upon single-subject training data, the method employs a dual-branch LoRA architecture to enhance representational separability and generalization. Extensive experiments demonstrate that our method consistently outperforms existing multi-subject-data-dependent approaches across image fidelity, subject identity consistency, and interaction naturalness. It significantly improves both practicality and scalability of multi-subject customization, enabling high-fidelity generation without requiring multiple subjects’ exemplars.

Technology Category

Application Category

📝 Abstract
Current multi-subject customization approaches encounter two critical challenges: the difficulty in acquiring diverse multi-subject training data, and attribute entanglement across different subjects. To bridge these gaps, we propose MUSAR - a simple yet effective framework to achieve robust multi-subject customization while requiring only single-subject training data. Firstly, to break the data limitation, we introduce debiased diptych learning. It constructs diptych training pairs from single-subject images to facilitate multi-subject learning, while actively correcting the distribution bias introduced by diptych construction via static attention routing and dual-branch LoRA. Secondly, to eliminate cross-subject entanglement, we introduce dynamic attention routing mechanism, which adaptively establishes bijective mappings between generated images and conditional subjects. This design not only achieves decoupling of multi-subject representations but also maintains scalable generalization performance with increasing reference subjects. Comprehensive experiments demonstrate that our MUSAR outperforms existing methods - even those trained on multi-subject dataset - in image quality, subject consistency, and interaction naturalness, despite requiring only single-subject dataset.
Problem

Research questions and friction points this paper is trying to address.

Overcoming multi-subject data scarcity with single-subject datasets
Resolving attribute entanglement across different subjects
Enhancing customization via dynamic attention routing mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiased diptych learning from single-subject data
Dynamic attention routing for subject decoupling
Dual-branch LoRA for bias correction
🔎 Similar Papers
No similar papers found.
Z
Zinan Guo
Bytedance Intelligent Creation
P
Pengze Zhang
Bytedance Intelligent Creation
Yanze Wu
Yanze Wu
ByteDance
computer vision
Chong Mou
Chong Mou
Peking University
Diffusion ModelAI Generated ContentLow-level Computer Vision
S
Songtao Zhao
Bytedance Intelligent Creation
Qian He
Qian He
ByteDance