Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses a key limitation of Low-Rank Adaptation (LoRA): its performance is constrained by the choice of the initial low-rank subspace, yet existing initialization strategies rely solely on geometric properties of pre-trained weights and ignore how downstream task data influence parameter sensitivity. To overcome this, the paper introduces, for the first time, a data-aware parameter sensitivity criterion grounded in the Fisher Information Matrix, establishing a curvature-aware framework that quantifies how parameter perturbations affect model predictions under the target data distribution. This enables the initialization of low-rank adaptation directions that are more aligned with the downstream task. By moving beyond conventional magnitude-based weight initialization paradigms, the proposed method consistently outperforms existing LoRA initialization strategies across diverse tasks and modalities, yielding significant and sustained improvements in downstream performance.

📝 Abstract

LoRA adapts large language models (LLMs) by restricting updates to low-rank subspaces of pre-trained weights. While this substantially reduces training cost, the effectiveness of adaptation critically depends on which subspace is chosen at initialization: a poor initialization that allocates capacity to task-irrelevant directions can severely hinder downstream performance. Existing initialization strategies primarily rely on the intrinsic properties of pre-trained weights, implicitly assuming that weight geometry alone reflects task relevance. However, such criteria overlook how the model interacts with the downstream data distribution. In this work, we formulate LoRA initialization as identifying the degree of impact of directions in parameter space under the target data distribution. We argue that data-aware sensitivity, rather than weight-only magnitude, should govern the choice of adaptation subspaces. Building on this perspective, we propose a Fisher-guided framework that leverages curvature information induced by downstream data to characterize how parameter perturbations influence model predictions. This perspective yields a principled, task-dependent criterion for selecting LoRA directions that better align adaptation with the target objective. Empirical results across diverse tasks and modalities demonstrate that data-aware initialization consistently and significantly improves downstream performance over existing approaches.

Problem

Research questions and friction points this paper is trying to address.

LoRA

initialization

subspace selection

data distribution

parameter adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher information

data-aware initialization

LoRA