Sharp Generalization Bounds for Foundation Models with Asymmetric Randomized Low-Rank Adapters

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the open problem of generalization behavior of asymmetric stochastic low-rank adapters (LoRA) under a single fixed random factor—where existing theoretical analyses only provide expectation-based upper bounds over multiple random draws, lacking high-probability guarantees for individual fine-tuning runs. We establish, for the first time, a tight generalization bound for frozen-random-factor asymmetric LoRA in the single-run setting. Leveraging tools from random matrix theory and generalization error analysis, we derive a high-probability upper bound of $ ilde{O}(sqrt{r}/N)$ and a matching lower bound of $Omega(1/sqrt{N})$, thereby characterizing the intrinsic sample complexity as $sqrt{r}/sqrt{N}$ and its fundamental relationship to the statistical limit $1/sqrt{N}$. Our result provides the first concentration-style characterization of the LoRA generalization gap, substantially enhancing the theoretical reliability of parameter-efficient fine-tuning in foundation model deployment.

Technology Category

Application Category

📝 Abstract

Low-Rank Adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning (PEFT) technique for foundation models. Recent work has highlighted an inherent asymmetry in the initialization of LoRA's low-rank factors, which has been present since its inception and was presumably derived experimentally. This paper focuses on providing a comprehensive theoretical characterization of asymmetric LoRA with frozen random factors. First, while existing research provides upper-bound generalization guarantees based on averages over multiple experiments, the behaviour of a single fine-tuning run with specific random factors remains an open question. We address this by investigating the concentration of the typical LoRA generalization gap around its mean. Our main upper bound reveals a sample complexity of $ ilde{mathcal{O}}left(frac{sqrt{r}}{sqrt{N}} ight)$ with high probability for rank $r$ LoRAs trained on $N$ samples. Additionally, we also determine the fundamental limits in terms of sample efficiency, establishing a matching lower bound of $mathcal{O}left(frac{1}{sqrt{N}} ight)$. By more closely reflecting the practical scenario of a single fine-tuning run, our findings offer crucial insights into the reliability and practicality of asymmetric LoRA.

Problem

Research questions and friction points this paper is trying to address.

Theoretical analysis of asymmetric LoRA initialization impact

Concentration of LoRA generalization gap in single runs

Sample complexity bounds for rank r LoRA training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Theoretical analysis of asymmetric LoRA initialization

Sample complexity bound for rank r LoRAs

Matching lower bound on sample efficiency

🔎 Similar Papers

Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad