Efficient Modular Learning through Naive LoRA Summation: Leveraging Orthogonality in High-Dimensional Models

πŸ“… 2025-08-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing multi-domain adaptation with LoRA suffers from low combinatorial efficiency and reliance on joint fine-tuning. Method: We propose a training-free, direct additive fusion of LoRA modules, grounded in the hypothesis that high-dimensional parameter updates are approximately orthogonal; we empirically establish a strong linear correlation between cosine similarity of task-specific LoRA update directions and performance degradation upon composition. Contribution/Results: On GPT-2 Small, LoRA adapters trained separately on mathematics, medicine, and finance domains achieve competitive performance when additively combinedβ€”e.g., mathematics + medicine reduces perplexity by 9.10%, matching the performance of joint fine-tuning on merged data. This work provides the first systematic empirical validation of LoRA module additivity and its geometric interpretability, establishing a new paradigm for efficient, scalable, modular adaptation of large language models.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in large language models are driven by scale, while parameter-efficient fine-tuning (PEFT) enables updating only a small fraction of parameters. Low-Rank Adaptation (LoRA) stores parameter deltas as the product of two small matrices, which makes them natural building blocks that can be composed. Motivated by the superposition principle, we hypothesize that independently trained LoRA modules on disjoint domains are approximately orthogonal and can be combined by simple addition. Using GPT-2 Small (117M) with LoRA rank 4 and alpha=64, we train adapters for three QA domains (math, medicine, finance). In pairwise tests, adding Math+Medicine adapters improves perplexity by -9.10% relative to merged-data fine-tuning, while Math+Finance and Finance+Medicine change by +4.54% and +27.56%, respectively. Across combinations, the RMS cosine similarity between LoRA deltas correlates positively and approximately linearly with the change in perplexity. Naive summation requires no additional training, can be applied in seconds, and achieves performance comparable to models trained on merged data, while clarifying when interference appears in higher-order compositions.
Problem

Research questions and friction points this paper is trying to address.

Combining independently trained LoRA modules efficiently
Assessing orthogonality of LoRA modules in high-dimensional models
Evaluating performance of naive LoRA summation across domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Naive LoRA summation for modular learning
Orthogonal LoRA modules combined by addition
No extra training needed for combination
πŸ”Ž Similar Papers
Z
Zhanhao Cao
University of California, Los Angeles (UCLA)
C
Clement Truong
UCLA
Andrew Lizarraga
Andrew Lizarraga
PhD Student @ UCLA
Representation LearningGenerative ModelingStatistics