Pretraining Induces a Reusable Spectral Basis for Downstream Task Adaptation

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This study investigates the stability of parameter directions during fine-tuning and their relationship to task transferability. Through spectral analysis, the authors find that the dominant singular vectors of pretrained weight matrices remain highly stable across fine-tuning and are shared across diverse tasks, suggesting that pretraining establishes a reusable spectral basis. Building on this observation, the work provides the first systematic evidence that these stable spectral directions encode transferable task structure and proposes an efficient fine-tuning method that optimizes only a small subset of spectral coefficients. Experiments on the GLUE benchmark demonstrate that updating as little as 0.2% of parameters yields competitive performance, confirming both the reusability of the spectral basis and the positive impact of pretraining scale on geometric transferability.
📝 Abstract
Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored during finetuning? Are these stable directions irrelevant to downstream tasks, or do they already encode task-relevant structure that requires no further adjustment? Answering this question is central to understanding how pretrained knowledge transfers. Through systematic spectral analysis across vision and language models, we show that the leading singular vectors of pretrained weight matrices remain highly stable under finetuning and are shared across unrelated downstream tasks, revealing that pretraining establishes a reusable spectral coordinate system. Models pretrained on larger datasets exhibit greater spectral stability under distribution shift or task change, directly linking pretraining scale to geometric transferability. Motivated by these findings, we propose a parameter-efficient method that freezes pretrained singular vectors and optimizes only leading spectral coefficients, achieving competitive performance on GLUE with 0.2% trainable parameters. Our results reveal that the stable directions encode transferable structure rather than irrelevant noise: successful pretraining discovers spectral bases that downstream tasks inherit and operate within.
Problem

Research questions and friction points this paper is trying to address.

pretraining
finetuning
spectral stability
parameter subspace
knowledge transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

spectral stability
pretraining
parameter-efficient finetuning
singular vectors
transferability
🔎 Similar Papers
No similar papers found.
Junjie Yu
Junjie Yu
Southern University of Science and Technology
Deep LearningNeuroscience
Y
Yue Wang
Department of Biomedical Engineering, Southern University of Science and Technology; Department of Psychology, The University of Hong Kong
Z
Zihan Deng
Department of Biomedical Engineering, Southern University of Science and Technology
Y
Yan Zhu
Department of Biomedical Engineering, Southern University of Science and Technology
W
Wenxiao Ma
Department of Biomedical Engineering, Southern University of Science and Technology
Q
Quanying Liu
Department of Biomedical Engineering, Southern University of Science and Technology