🤖 AI Summary
To address the prohibitively high computational cost of full-parameter fine-tuning (FPT) for large language models (LLMs), this paper proposes a semantics-aware layer freezing strategy. It is the first to model each layer’s contribution to loss reduction from the perspective of semantic evolution in hidden representations. By analyzing hidden-space transition trajectories and layer-wise bias sensitivity, and integrating scaling laws, the method derives a layer-wise gain estimation that enables interpretable, dynamic selection of trainable layers (“where to fine-tune”). Crucially, backward propagation is omitted for frozen layers, substantially reducing training memory and FLOPs—up to 62% savings. Extensive experiments across multiple LLMs and downstream datasets demonstrate that the approach maintains or even surpasses the performance of both FPT and leading parameter-efficient fine-tuning (PEFT) methods. The core innovation lies in unifying semantic evolution modeling with scaling-law-driven gain quantification, providing both theoretical grounding and a practical framework for efficient LLM adaptation.
📝 Abstract
Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on extit{how to finetune} but neglects the issue of extit{where to finetune}. As a pioneering work on reducing the cost of backpropagation (at the layer level) by answering where to finetune, we conduct a semantic analysis of the LM inference process. We first propose using transition traces of the latent representation to compute deviations (or loss). Then, using a derived formula of scaling law, we estimate the gain of each layer in reducing deviation (or loss). Further, we narrow down the scope for finetuning, and also, study the cost-benefit balance of LM finetuning. We perform extensive experiments across well-known LMs and datasets. The results show that our approach is effective and efficient, and outperforms the existing baselines. Our approach is orthogonal to other techniques on improving finetuning efficiency, such as PEFT methods, offering practical values on LM finetuning.