GRIT - Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation

📅 2026-01-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation of existing parameter-efficient fine-tuning methods such as LoRA, which ignore the geometric structure of the loss landscape by optimizing within a fixed low-rank subspace, often leading to parameter drift and suboptimal update efficiency. To overcome this, the authors propose GRIT, a dynamic geometry-aware variant of LoRA that incorporates K-FAC preconditioned gradients, a Fisher information–guided dynamic basis re-projection mechanism, and adaptive rank adjustment based on spectral distribution to geometrically steer the optimization trajectory. Experiments on LLaMA-family models demonstrate that GRIT reduces trainable parameters by 46% on average (ranging from 25% to 80% across tasks) while matching or surpassing the performance of LoRA and QLoRA on multiple benchmarks. Moreover, GRIT significantly mitigates parameter drift and improves the trade-off between model updating and knowledge retention.

Technology Category

Application Category

📝 Abstract
Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subspaces with first-order descent, mostly ignoring local loss curvature. This can inflate the effective update budget and amplify drift along weakly constrained directions. We introduce GRIT, a dynamic, curvature-aware LoRA procedure that preserves the LoRA parameterization but: (1) preconditions gradients in rank space using K-FAC as a natural-gradient proxy; (2) periodically reprojects the low-rank basis onto dominant Fisher eigendirections to suppress drift; and (3) adapts the effective rank from the spectrum so capacity concentrates where signal resides. Across instruction-following, comprehension, and reasoning benchmarks on LLaMA backbones, GRIT matches or surpasses LoRA and QLoRA while reducing trainable parameters by 46% on average (25--80% across tasks), without practical quality loss across prompt styles and data mixes. To model forgetting, we fit a curvature-modulated power law. Empirically, GRIT yields lower drift and a better updates-vs-retention frontier than strong PEFT-optimizer baselines (Orthogonal-LoRA, IA3, DoRA, Eff-FT, Shampoo).
Problem

Research questions and friction points this paper is trying to address.

parameter-efficient fine-tuning
loss curvature
low-rank adaptation
gradient drift
geometry-agnostic optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometry-aware PEFT
K-FAC preconditioning
Fisher-guided reprojection
Dynamic rank adaptation
Low-rank fine-tuning
P
Pritish Saha
RAAPID Lab, USA
C
Chandrav Rajbangshi
RAAPID Lab, USA
R
Rudra Goyal
RAAPID Lab, USA
Mohit Goyal
Mohit Goyal
Engineer at Google
Machine Learning
A
Anurag Deo
RAAPID Lab, USA
B
Biswajit Roy
RAAPID Lab, USA
N
Ningthoujam Dhanachandra Singh
RAAPID Lab, USA
Raxit Goswami
Raxit Goswami
VP of Research at Raapid
Natural Language Processing
A
Amitava Das
Pragya Lab, BITS Pilani, Goa