π€ AI Summary
Existing multi-domain adaptation with LoRA suffers from low combinatorial efficiency and reliance on joint fine-tuning. Method: We propose a training-free, direct additive fusion of LoRA modules, grounded in the hypothesis that high-dimensional parameter updates are approximately orthogonal; we empirically establish a strong linear correlation between cosine similarity of task-specific LoRA update directions and performance degradation upon composition. Contribution/Results: On GPT-2 Small, LoRA adapters trained separately on mathematics, medicine, and finance domains achieve competitive performance when additively combinedβe.g., mathematics + medicine reduces perplexity by 9.10%, matching the performance of joint fine-tuning on merged data. This work provides the first systematic empirical validation of LoRA module additivity and its geometric interpretability, establishing a new paradigm for efficient, scalable, modular adaptation of large language models.
π Abstract
Recent advances in large language models are driven by scale, while parameter-efficient fine-tuning (PEFT) enables updating only a small fraction of parameters. Low-Rank Adaptation (LoRA) stores parameter deltas as the product of two small matrices, which makes them natural building blocks that can be composed. Motivated by the superposition principle, we hypothesize that independently trained LoRA modules on disjoint domains are approximately orthogonal and can be combined by simple addition. Using GPT-2 Small (117M) with LoRA rank 4 and alpha=64, we train adapters for three QA domains (math, medicine, finance). In pairwise tests, adding Math+Medicine adapters improves perplexity by -9.10% relative to merged-data fine-tuning, while Math+Finance and Finance+Medicine change by +4.54% and +27.56%, respectively. Across combinations, the RMS cosine similarity between LoRA deltas correlates positively and approximately linearly with the change in perplexity. Naive summation requires no additional training, can be applied in seconds, and achieves performance comparable to models trained on merged data, while clarifying when interference appears in higher-order compositions.