🤖 AI Summary
This study investigates the stability of parameter directions during fine-tuning and their relationship to task transferability. Through spectral analysis, the authors find that the dominant singular vectors of pretrained weight matrices remain highly stable across fine-tuning and are shared across diverse tasks, suggesting that pretraining establishes a reusable spectral basis. Building on this observation, the work provides the first systematic evidence that these stable spectral directions encode transferable task structure and proposes an efficient fine-tuning method that optimizes only a small subset of spectral coefficients. Experiments on the GLUE benchmark demonstrate that updating as little as 0.2% of parameters yields competitive performance, confirming both the reusability of the spectral basis and the positive impact of pretraining scale on geometric transferability.
📝 Abstract
Finetuning pretrained models occurs in a low-dimensional subspace of the full parameter space. Prior work has focused on characterizing this optimization subspace, but largely ignored the complementary question: why do certain directions remain unexplored during finetuning? Are these stable directions irrelevant to downstream tasks, or do they already encode task-relevant structure that requires no further adjustment? Answering this question is central to understanding how pretrained knowledge transfers. Through systematic spectral analysis across vision and language models, we show that the leading singular vectors of pretrained weight matrices remain highly stable under finetuning and are shared across unrelated downstream tasks, revealing that pretraining establishes a reusable spectral coordinate system. Models pretrained on larger datasets exhibit greater spectral stability under distribution shift or task change, directly linking pretraining scale to geometric transferability. Motivated by these findings, we propose a parameter-efficient method that freezes pretrained singular vectors and optimizes only leading spectral coefficients, achieving competitive performance on GLUE with 0.2% trainable parameters. Our results reveal that the stable directions encode transferable structure rather than irrelevant noise: successful pretraining discovers spectral bases that downstream tasks inherit and operate within.