🤖 AI Summary
This work addresses the challenge of low concept fidelity and poor scalability in multi-concept composition due to representational interference. The authors propose SeqLoRA, which introduces dual-layer orthogonal constraints into LoRA training for the first time, enabling joint learning of LoRA factors through bilevel optimization. This approach constrains parameter adaptation during continual learning to mitigate interference and preserve previously acquired concepts. Theoretically, they prove that data-driven LoRA bases minimize residual interference energy and derive a high-probability upper bound on catastrophic forgetting. Experiments demonstrate that SeqLoRA significantly enhances identity preservation and scalability in compositional generation across up to 101 concepts, eliminates the need for costly fusion operations, and effectively reduces attribute interference.
📝 Abstract
Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit expressiveness and concept fidelity. To address this trade-off, we propose Sequential regularized LoRA (SeqLoRA), a constrained continual learning framework that jointly optimizes both LoRA factors via bilevel optimization. Theoretically, we establish strong convergence guarantees for our algorithm and model the residual layer activations as a matrix sub-Gaussian process to derive high-probability bounds on catastrophic forgetting. We further prove that learning the LoRA basis from data minimizes residual interference energy more effectively than frozen-basis methods. Experiments on multi-concept image generation demonstrate that SeqLoRA improves identity preservation and scalability across up to 101 concepts, while avoiding costly fusion and reducing attribute interference in composed generations.