🤖 AI Summary
LoRA’s random initialization restricts gradient updates to a tangent subspace misaligned with the pretrained model’s activation distribution, causing early information loss and slow convergence. To address this, we propose Activation Boundary Matching (ABM), a novel initialization strategy that aligns the activation boundaries of LoRA adapters with those of the backbone model prior to downstream fine-tuning. ABM maximizes the projection of full-parameter gradients onto the low-rank subspace, thereby significantly reducing initial optimization bias. To our knowledge, this is the first work to incorporate activation boundary alignment into LoRA initialization. The method is architecture-agnostic, validated on T5, LLaMA2, and ViT. Empirical results demonstrate accelerated convergence across GLUE, WizardLM, and VTAB-1K benchmarks; on VTAB-1K, ABM yields a +2.1% average accuracy gain, with particularly pronounced improvements on geometric reasoning tasks—confirming its parameter efficiency and strong generalization capability.
📝 Abstract
We propose Activation Boundary Matching for Low-Rank Adaptation (ABM-LoRA), a principled initialization strategy that substantially accelerates the convergence of low-rank adapters. While LoRA offers high parameter efficiency, its random initialization restricts gradient updates to a mismatched tangent space, causing significant information loss and hindering early convergence. Our ABM-LoRA addresses this by aligning the adapter's activation boundaries with those of the pretrained model before downstream training, thereby maximizing the projection of full-parameter gradients into the adapter subspace. This alignment sharply reduces information loss at initialization, yields a lower starting loss, and accelerates convergence. We demonstrate ABM-LoRA's effectiveness across diverse architectures and tasks: language understanding (T5-Base on GLUE), dialogue generation (LLaMA2-7B on WizardLM), and vision recognition (ViT-B/16 on VTAB-1K). On VTAB-1K, it achieves the highest accuracy among all methods, with strong gains on structured reasoning tasks requiring geometric understanding.