FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses a critical limitation in existing parameter-efficient fine-tuning methods, which often disregard the spectral structure of pre-trained model weights and are thus susceptible to noisy gradients that degrade robust feature representations. To overcome this, we propose FuRA, a novel framework that introduces spectral preconditioning into efficient fine-tuning for the first time. Leveraging a block-wise tensor column decomposition—formulated via singular value decomposition (SVD)—FuRA freezes the pre-trained singular bases and updates only the compact core tensors and singular values. This approach preserves full-rank expressiveness while achieving high parameter efficiency, and it naturally extends to a 4-bit quantized variant, QFuRA. Across diverse benchmarks—including commonsense reasoning in large language models (+1.37 points), mathematical reinforcement learning, and vision-language instruction tuning—both FuRA and QFuRA consistently outperform full fine-tuning, LoRA, and QLoRA.

📝 Abstract

Both full fine-tuning (Full FT) and parameter-efficient fine-tuning methods such as LoRA introduce weight updates without accounting for the spectral structure established during pretraining. As a result, noisy gradients from limited fine-tuning data can perturb robust pretrained features. We identify spectral preconditioning as the missing ingredient: reparameterizing each weight matrix through its full-rank singular value decomposition (SVD) and freezing one singular basis constrains updates to the pretrained column space, yielding a preconditioned optimization scheme that outperforms unconstrained Full FT at the same trainable parameter count. Building on this insight, we propose FuRA (Full-Rank Adaptation), an efficient full-rank adaptation framework based on a block tensor-train factorization W = LSR, where the large core L is fixed to the pretrained block-wise SVD basis, while only the compact core R and the block-wise singular values S are optimized. This design simultaneously provides full-rank spectral preconditioning, preserves full-rank update expressivity, and achieves parameter, memory, and step-time efficiency comparable to LoRA. FuRA consistently outperforms Full FT across multiple settings, including LLM fine-tuning (+1.37 on LLaMA-3-8B commonsense reasoning), LLM reinforcement learning for mathematical reasoning, and visual instruction tuning for VLMs. Furthermore, the 4-bit quantized variant, QFuRA, also surpasses QLoRA. Code is available at https://github.com/olokevin/FuRA-NIPS

Problem

Research questions and friction points this paper is trying to address.

spectral structure

parameter-efficient fine-tuning

pretraining

weight updates

noisy gradients

Innovation

Methods, ideas, or system contributions that make the work stand out.

spectral preconditioning

full-rank adaptation

parameter-efficient fine-tuning