Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

πŸ“… 2025-09-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates the mechanistic impact of supervised fine-tuning (SFT) on the knowledge structure of large language models (LLMs). Addressing the lack of clarity and controllability in SFT-induced knowledge evolution, we systematically analyze performance changes across varying fine-tuning scales and knowledge mastery levels on LLaMA-2/3 models, using closed-book question answering as the evaluation task. We combine token-level output analysis with parameter-level update tracking to characterize knowledge dynamics. Our findings reveal that up to 90% of parameter updates during SFT contribute negligibly to knowledge enhancement. Crucially, selectively restoring only the most critical updates yields substantial gainsβ€”few-shot fine-tuning outperforms full-data tuning by 14%, while fluctuations in knowledge mastery induce over 12% performance variance. This work is the first to uncover the non-uniform coupling between parameter updates and knowledge evolution in SFT, establishing a novel paradigm for controllable, targeted knowledge editing in LLMs.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.
Problem

Research questions and friction points this paper is trying to address.

Evaluating how supervised fine-tuning affects language model knowledge retention
Investigating why minimal fine-tuning data sometimes outperforms extensive datasets
Analyzing parameter updates that don't contribute to knowledge improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated CBQA performance across LLaMA models
Analyzed token and parameter level model behavior
Identified non-contributing parameter updates during SFT
J
Junjie Ye
Fudan University
Yuming Yang
Yuming Yang
Fudan University
Natural Language ProcessingLarge Language Models
Y
Yang Nan
Fudan University
S
Shuo Li
Fudan University
Q
Qi Zhang
Fudan University, Shanghai Key Lab of Intelligent Information Processing
T
Tao Gui
Fudan University, Shanghai Key Lab of Intelligent Information Processing, Shanghai Innovation Institute
X
Xuanjing Huang
Fudan University, Shanghai Key Lab of Intelligent Information Processing
P
Peng Wang
Lenovo Research, Beijing, China
Z
Zhongchao Shi
Lenovo Research, Beijing, China
Jianping Fan
Jianping Fan
AI Lab at Lenovo Research
AIComputer VisionMachine LearningQuantum Computing