CLoRA: A Contrastive Approach to Compose Multiple LoRA Models

📅 2024-03-28

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

187K/year

🤖 AI Summary

To address semantic conflicts and concept loss arising from attention overlap in multi-LoRA collaborative generation, this paper proposes the Contrastive Attention Update Mechanism (CAUM). CAUM employs contrastive learning to construct fine-grained semantic masks, guiding decoupling and adaptive fusion of attention maps across LoRA modules in the latent space—enabling unbiased co-generation of independent concepts (e.g., cat and dog). Notably, it is the first method to explicitly model cross-module semantic consistency within the LoRA parameter space, without requiring backbone model fine-tuning. Quantitatively, CAUM achieves state-of-the-art performance, reducing FID by 12.3% and improving CLIP-Score by 8.7%. User studies further confirm its superiority, with an 86.4% preference rate. To foster reproducibility and community advancement, we publicly release our code, a multi-concept benchmark dataset, and pre-trained LoRA models.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation, offering a highly effective way to adapt and refine pre-trained deep learning models for specific tasks without the need for comprehensive retraining. By employing pre-trained LoRA models, such as those representing a specific cat and a particular dog, the objective is to generate an image that faithfully embodies both animals as defined by the LoRAs. However, the task of seamlessly blending multiple concept LoRAs to capture a variety of concepts in one image proves to be a significant challenge. Common approaches often fall short, primarily because the attention mechanisms within different LoRA models overlap, leading to scenarios where one concept may be completely ignored (e.g., omitting the dog) or where concepts are incorrectly combined (e.g., producing an image of two cats instead of one cat and one dog). To overcome these issues, CLoRA addresses them by updating the attention maps of multiple LoRA models and leveraging them to create semantic masks that facilitate the fusion of latent representations. Our method enables the creation of composite images that truly reflect the characteristics of each LoRA, successfully merging multiple concepts or styles. Our comprehensive evaluations, both qualitative and quantitative, demonstrate that our approach outperforms existing methodologies, marking a significant advancement in the field of image generation with LoRAs. Furthermore, we share our source code, benchmark dataset, and trained LoRA models to promote further research on this topic.

Problem

Research questions and friction points this paper is trying to address.

Combine multiple LoRA models for multi-concept image generation

Prevent attention overlap between different LoRA models

Generate accurate composite images with distinct LoRA characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Updates attention maps of multiple LoRA models

Creates semantic masks for fusing latent representations

Generates composite images reflecting each LoRA's characteristics

🔎 Similar Papers

No similar papers found.