Bi-LoRA: Efficient Sharpness-Aware Minimization for Fine-Tuning Large-Scale Models

๐Ÿ“… 2025-08-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address poor generalization in few-shot fine-tuning of large pretrained models and the prohibitively high computational and memory overhead of Sharpness-Aware Minimization (SAM), this paper proposes Bi-LoRA, a bidirectional low-rank adaptation framework. Its core innovation is a dual-LoRA module: a primary module performs standard gradient descent, while an auxiliary module explicitly models adversarial perturbations to estimate loss surface sharpness; both modules feature parameter decoupling and joint optimization, eliminating SAMโ€™s costly two-pass forward/backward computations. This design retains LoRAโ€™s parameter efficiency while enabling, for the first time, scalable and efficient sharpness-aware fine-tuning. Experiments across multiple tasks and architectures demonstrate that Bi-LoRA significantly improves generalization over baselines, with memory and computational overhead nearly matching standard LoRAโ€”and substantially lower than SAM.

Technology Category

Application Category

๐Ÿ“ Abstract
Fine-tuning large-scale pre-trained models with limited data presents significant challenges for generalization. While Sharpness-Aware Minimization (SAM) has proven effective in improving generalization by seeking flat minima, its substantial extra memory and computation overhead make it impractical for large models. Integrating SAM with parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) is a promising direction. However, we find that directly applying SAM to LoRA parameters limits the sharpness optimization to a restricted subspace, hindering its effectiveness. To address this limitation, we propose Bi-directional Low-Rank Adaptation (Bi-LoRA), which introduces an auxiliary LoRA module to model SAM's adversarial weight perturbations. It decouples SAM's weight perturbations from LoRA optimization: the primary LoRA module adapts to specific tasks via standard gradient descent, while the auxiliary module captures the sharpness of the loss landscape through gradient ascent. Such dual-module design enables Bi-LoRA to capture broader sharpness for achieving flatter minima while remaining memory-efficient. Another important benefit is that the dual design allows for simultaneous optimization and perturbation, eliminating SAM's doubled training costs. Extensive experiments across diverse tasks and architectures demonstrate Bi-LoRA's efficiency and effectiveness in enhancing generalization.
Problem

Research questions and friction points this paper is trying to address.

Improves generalization in fine-tuning large models with limited data
Reduces memory and computation costs of sharpness-aware minimization
Overcomes restricted subspace limitation in SAM-LoRA integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-directional Low-Rank Adaptation design
Decouples optimization and sharpness capture
Eliminates doubled training costs
๐Ÿ”Ž Similar Papers
No similar papers found.