Tiny Brains, Giant Impact: Uncovering the Keystone Neurons of LLM with Just a Few Prompts

📅 2026-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the opacity of large language models and the unclear origins of their capabilities. Through cross-task activation analysis, the authors uncover— for the first time—a set of sparse, stable “critical neurons” that emerge during pretraining and are essential for task performance. Building on this finding, they propose an efficient fine-tuning strategy that updates only these critical neurons. Experimental results demonstrate that this approach matches or even surpasses full-parameter fine-tuning across multiple tasks, while substantially reducing the number of updated parameters and better preserving the model’s general-purpose capabilities.
📝 Abstract
Large language models (LLMs) display strong comprehensive abilities, yet the internal mechanisms that support these behaviors remain insufficiently understood. In this work, we show that across a wide range of open-weight Transformers, a subset of neurons remains consistently highly activated during inference across tasks of multiple capability dimensions. By probing along the cross-task activation strength, an extremely sparse subset is isolated, whose removal causes a collapse in model behavior, which we term keystone neurons. Our analysis reveals that keystone neurons are a stable and intrinsic neuron subset of the model that is largely established during pretraining. The parameters associated with these neurons are tightly calibrated during the training process, and their precise values are critical for the capabilities of the model. Building on these insights, we propose a supervised fine-tuning approach that updates only keystone neurons, achieving task gains comparable to or even better than full-parameter fine-tuning while better preserving performance in other capability dimensions, despite modifying a much smaller number of parameters.
Problem

Research questions and friction points this paper is trying to address.

large language models
keystone neurons
model interpretability
neural activation
internal mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

keystone neurons
sparse activation
parameter-efficient fine-tuning
model interpretability
Transformer probing
🔎 Similar Papers
No similar papers found.