Unveiling Instruction-Specific Neurons&Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Despite empirical success in instruction-following fine-tuning of large language models (LLMs), the underlying mechanisms—particularly the functional roles of instruction-specific sparse computational units (e.g., neurons in dense models or neurons/experts in Mixture-of-Experts models)—remain poorly understood. Method: We propose SPARCOM, a novel analytical framework, and introduce HexaInst, a balanced six-category instruction dataset, to systematically disentangle and quantify the functional generality, uniqueness, and cross-task transferability of these units. Using sparse localization, causal attribution alignment, and cross-architecture comparison, we identify instruction-specific components across model types. Contribution/Results: We empirically demonstrate that instruction-specific units exhibit both strong generalization across diverse instructions and high functional specificity, serving as core computational primitives for instruction execution. This advances LLM interpretability and trustworthiness, establishing a new paradigm for controllable fine-tuning and mechanism-driven model editing.

Technology Category

Application Category

📝 Abstract
The finetuning of Large Language Models (LLMs) has significantly advanced their instruction-following capabilities, yet the underlying computational mechanisms driving these improvements remain poorly understood. This study systematically examines how fine-tuning reconfigures LLM computations by isolating and analyzing instruction-specific sparse components, i.e., neurons in dense models and both neurons and experts in Mixture-of-Experts (MoE) architectures. In particular, we introduce HexaInst, a carefully curated and balanced instructional dataset spanning six distinct categories, and propose SPARCOM, a novel analytical framework comprising three key contributions: (1) a method for identifying these sparse components, (2) an evaluation of their functional generality and uniqueness, and (3) a systematic comparison of their alterations. Through experiments, we demonstrate functional generality, uniqueness, and the critical role of these components in instruction execution. By elucidating the relationship between fine-tuning-induced adaptations and sparse computational substrates, this work provides deeper insights into how LLMs internalize instruction-following behavior for the trustworthy LLM community.
Problem

Research questions and friction points this paper is trying to address.

Understanding computational mechanisms behind LLM instruction-following improvements
Analyzing instruction-specific neurons and experts in dense and MoE models
Developing HexaInst dataset and SPARCOM framework to study fine-tuning adaptations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifying instruction-specific sparse components in LLMs
Introducing HexaInst balanced instructional dataset
Proposing SPARCOM analytical framework for evaluation
🔎 Similar Papers
No similar papers found.
Junyan Zhang
Junyan Zhang
National University of Singapore
Large Language Model
Y
Yubo Gao
The Hong Kong University of Science and Technology (Guangzhou)
Yibo Yan
Yibo Yan
East China Normal University
High-dimensional Statistics
J
Jungang Li
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zhaorui Hou
The Hong Kong University of Science and Technology (Guangzhou)
S
Sicheng Tao
The Hong Kong University of Science and Technology (Guangzhou)
Shuliang Liu
Shuliang Liu
PhD, HKUST(GZ)
Trustworthy LLMVLMRecommendation System
S
Song Dai
The Hong Kong University of Science and Technology (Guangzhou)
Y
Yonghua Hei
The Hong Kong University of Science and Technology (Guangzhou)
J
Junzhuo Li
The Hong Kong University of Science and Technology (Guangzhou), The Hong Kong University of Science and Technology
Xuming Hu
Xuming Hu
Assistant Professor, HKUST(GZ) / HKUST
Natural Language ProcessingLarge Language Model