A Layer-wise Analysis of Supervised Fine-Tuning

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the challenges of catastrophic forgetting in supervised fine-tuning (SFT) and the unclear internal mechanisms underlying instruction-following capabilities in language models. By integrating information-theoretic analysis, geometric metrics, and optimization trajectory inspection across models ranging from 1B to 32B parameters, the work reveals—for the first time—that instruction alignment exhibits architectural locality: representations in intermediate layers (20%–80% depth) remain stable, while those in the final layers are highly sensitive. Building on this insight, the authors propose Mid-Block Efficient Tuning, which fine-tunes only critical intermediate layers. This approach achieves up to a 10.2% improvement over standard LoRA on GSM8K (using OLMo2-7B) while substantially reducing the number of trainable parameters.

Technology Category

Application Category

📝 Abstract

While critical for alignment, Supervised Fine-Tuning (SFT) incurs the risk of catastrophic forgetting, yet the layer-wise emergence of instruction-following capabilities remains elusive. We investigate this mechanism via a comprehensive analysis utilizing information-theoretic, geometric, and optimization metrics across model scales (1B-32B). Our experiments reveal a distinct depth-dependent pattern: middle layers (20\%-80\%) are stable, whereas final layers exhibit high sensitivity. Leveraging this insight, we propose Mid-Block Efficient Tuning, which selectively updates these critical intermediate layers. Empirically, our method outperforms standard LoRA up to 10.2\% on GSM8K (OLMo2-7B) with reduced parameter overhead, demonstrating that effective alignment is architecturally localized rather than distributed. The code is publicly available at https://anonymous.4open.science/r/base_sft.

Problem

Research questions and friction points this paper is trying to address.

Supervised Fine-Tuning

catastrophic forgetting

instruction-following capabilities

layer-wise analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised Fine-Tuning

Layer-wise Analysis

Catastrophic Forgetting

Parameter-Efficient Tuning