🤖 AI Summary
Despite empirical success in instruction-following fine-tuning of large language models (LLMs), the underlying mechanisms—particularly the functional roles of instruction-specific sparse computational units (e.g., neurons in dense models or neurons/experts in Mixture-of-Experts models)—remain poorly understood.
Method: We propose SPARCOM, a novel analytical framework, and introduce HexaInst, a balanced six-category instruction dataset, to systematically disentangle and quantify the functional generality, uniqueness, and cross-task transferability of these units. Using sparse localization, causal attribution alignment, and cross-architecture comparison, we identify instruction-specific components across model types.
Contribution/Results: We empirically demonstrate that instruction-specific units exhibit both strong generalization across diverse instructions and high functional specificity, serving as core computational primitives for instruction execution. This advances LLM interpretability and trustworthiness, establishing a new paradigm for controllable fine-tuning and mechanism-driven model editing.
📝 Abstract
The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.