Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study investigates the mechanistic impact of supervised fine-tuning (SFT) on the knowledge structure of large language models (LLMs). Addressing the lack of clarity and controllability in SFT-induced knowledge evolution, we systematically analyze performance changes across varying fine-tuning scales and knowledge mastery levels on LLaMA-2/3 models, using closed-book question answering as the evaluation task. We combine token-level output analysis with parameter-level update tracking to characterize knowledge dynamics. Our findings reveal that up to 90% of parameter updates during SFT contribute negligibly to knowledge enhancement. Crucially, selectively restoring only the most critical updates yields substantial gains—few-shot fine-tuning outperforms full-data tuning by 14%, while fluctuations in knowledge mastery induce over 12% performance variance. This work is the first to uncover the non-uniform coupling between parameter updates and knowledge evolution in SFT, establishing a novel paradigm for controllable, targeted knowledge editing in LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.

Problem

Research questions and friction points this paper is trying to address.

Evaluating how supervised fine-tuning affects language model knowledge retention

Investigating why minimal fine-tuning data sometimes outperforms extensive datasets

Analyzing parameter updates that don't contribute to knowledge improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated CBQA performance across LLaMA models

Analyzed token and parameter level model behavior

Identified non-contributing parameter updates during SFT

🔎 Similar Papers

Balancing Speciality and Versatility: A Coarse to Fine Framework for Mitigating Catastrophic Forgetting in Large Language Models