🤖 AI Summary
This study investigates the differential contributions of individual layers in large language models (LLMs) during supervised fine-tuning for alignment.
Method: We propose Importance-aware Layer Adaptation (ILA), a binary mask learning framework to quantitatively assess layer-wise importance.
Contribution/Results: Through systematic analysis, we find—contrary to common assumptions—that alignment primarily reshapes representation styles rather than modifying underlying knowledge. Crucially, key layers identified by ILA exhibit 90% overlap across diverse datasets, demonstrating strong generalizability. Empirically, freezing non-critical layers improves final alignment performance while substantially reducing GPU memory consumption and computational cost. Moreover, fine-tuning only the ILA-identified critical layers achieves 98% of the performance attained by full-model fine-tuning. Our work establishes a new paradigm for efficient, interpretable, and layer-aware LLM alignment.
📝 Abstract
Aligning large language models (LLMs) through supervised fine-tuning is essential for tailoring them to specific applications. Recent studies suggest that alignment primarily adjusts a model's presentation style rather than its foundational knowledge, indicating that only certain components of the model are significantly impacted. To uncover how alignment affects model behavior at a granular level, we propose identifying which layers within LLMs are most critical to the alignment process. Our approach, named ILA, involves learning a binary mask for the parameter changes in each layer during alignment, as an indicator of layer significance. Experimental results reveal that, despite substantial differences in alignment datasets, the important layers of a model identified by ILA exhibit nearly 90% overlap, highlighting fundamental patterns in LLM alignment. The results also indicate that freezing non-essential layers improves overall model performance, while selectively tuning the most critical layers significantly enhances fine-tuning efficiency with minimal performance loss. Finally, we discuss how these findings extend from LLM alignment to reasoning.