Dynamic Training-Free Fusion of Subject and Style LoRAs

📅 2026-02-17

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing LoRA fusion methods rely on static heuristic strategies that overlook input stochasticity and the adaptive nature of LoRA, making it challenging to harmonize subject and style generation. This work proposes the first training-free dynamic fusion framework: during forward propagation, layer-wise LoRA weights are dynamically selected based on KL divergence, and in the reverse denoising phase, latent space refinement is guided by gradients derived from CLIP and DINO semantic metrics. This approach achieves, for the first time, dynamic, feature-level LoRA fusion throughout the entire diffusion process. It substantially outperforms existing methods across diverse subject–style combinations, demonstrating state-of-the-art performance in both qualitative and quantitative evaluations.

Technology Category

Application Category

📝 Abstract

Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-applied layer, we dynamically compute the KL divergence between the base model's original features and those produced by subject and style LoRAs, respectively, and adaptively select the most appropriate weights for fusion. In the reverse denoising stage, we further refine the generation trajectory by dynamically applying gradient-based corrections derived from objective metrics such as CLIP and DINO scores, providing continuous semantic and stylistic guidance. By integrating these two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timeline, our method dynamically achieves coherent subject-style synthesis without any retraining. Extensive experiments across diverse subject-style combinations demonstrate that our approach consistently outperforms state-of-the-art LoRA fusion methods both qualitatively and quantitatively.

Problem

Research questions and friction points this paper is trying to address.

LoRA fusion

subject-style synthesis

diffusion models

dynamic fusion

training-free

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic LoRA Fusion

Training-Free

KL Divergence