Efficient Modular Learning through Naive LoRA Summation: Leveraging Orthogonality in High-Dimensional Models

📅 2025-08-16

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing multi-domain adaptation with LoRA suffers from low combinatorial efficiency and reliance on joint fine-tuning. Method: We propose a training-free, direct additive fusion of LoRA modules, grounded in the hypothesis that high-dimensional parameter updates are approximately orthogonal; we empirically establish a strong linear correlation between cosine similarity of task-specific LoRA update directions and performance degradation upon composition. Contribution/Results: On GPT-2 Small, LoRA adapters trained separately on mathematics, medicine, and finance domains achieve competitive performance when additively combined—e.g., mathematics + medicine reduces perplexity by 9.10%, matching the performance of joint fine-tuning on merged data. This work provides the first systematic empirical validation of LoRA module additivity and its geometric interpretability, establishing a new paradigm for efficient, scalable, modular adaptation of large language models.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models are driven by scale, while parameter-efficient fine-tuning (PEFT) enables updating only a small fraction of parameters. Low-Rank Adaptation (LoRA) stores parameter deltas as the product of two small matrices, which makes them natural building blocks that can be composed. Motivated by the superposition principle, we hypothesize that independently trained LoRA modules on disjoint domains are approximately orthogonal and can be combined by simple addition. Using GPT-2 Small (117M) with LoRA rank 4 and alpha=64, we train adapters for three QA domains (math, medicine, finance). In pairwise tests, adding Math+Medicine adapters improves perplexity by -9.10% relative to merged-data fine-tuning, while Math+Finance and Finance+Medicine change by +4.54% and +27.56%, respectively. Across combinations, the RMS cosine similarity between LoRA deltas correlates positively and approximately linearly with the change in perplexity. Naive summation requires no additional training, can be applied in seconds, and achieves performance comparable to models trained on merged data, while clarifying when interference appears in higher-order compositions.

Problem

Research questions and friction points this paper is trying to address.

Combining independently trained LoRA modules efficiently

Assessing orthogonality of LoRA modules in high-dimensional models

Evaluating performance of naive LoRA summation across domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Naive LoRA summation for modular learning

Orthogonal LoRA modules combined by addition

No extra training needed for combination

🔎 Similar Papers

Breaking Neural Network Scaling Laws with Modularity