Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This study investigates why supervised fine-tuning (SFT) is consistently effective for small models yet yields inconsistent or even detrimental results in large language models (LLMs). To address this, the work introduces a novel perspective by analyzing the evolution of inter-token interactions during SFT, leveraging interaction-based interpretability techniques to quantify and track dynamic changes in interaction strength. The findings reveal that in LLMs, SFT primarily acts as a denoising mechanism within an extremely short training window, after which it rapidly overfits, leading to performance degradation. This insight offers a new understanding of the boundaries of SFT effectiveness and empirically validates the necessity of early stopping across multiple LLMs and datasets, providing practical guidance for optimizing fine-tuning protocols.
📝 Abstract
This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs. We find that the evolution of interactions during SFT can effectively explain the inconsistent effectiveness of SFT for LLMs. Specifically, we find that (1) SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. (2) This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions. We validate these findings across multiple LLMs and datasets. Our findings provide new insights into early stopping and offer practical guidance for LLM training.
Problem

Research questions and friction points this paper is trying to address.

supervised fine-tuning
large language models
interaction
effectiveness inconsistency
overfitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

interaction-based explanation
supervised fine-tuning
large language models
noise-like interactions
early stopping