🤖 AI Summary
This study addresses the opacity of large language models and the unclear origins of their capabilities. Through cross-task activation analysis, the authors uncover— for the first time—a set of sparse, stable “critical neurons” that emerge during pretraining and are essential for task performance. Building on this finding, they propose an efficient fine-tuning strategy that updates only these critical neurons. Experimental results demonstrate that this approach matches or even surpasses full-parameter fine-tuning across multiple tasks, while substantially reducing the number of updated parameters and better preserving the model’s general-purpose capabilities.
📝 Abstract
Large language models (LLMs) display strong comprehensive abilities, yet the internal mechanisms that support these behaviors remain insufficiently understood. In this work, we show that across a wide range of open-weight Transformers, a subset of neurons remains consistently highly activated during inference across tasks of multiple capability dimensions. By probing along the cross-task activation strength, an extremely sparse subset is isolated, whose removal causes a collapse in model behavior, which we term keystone neurons. Our analysis reveals that keystone neurons are a stable and intrinsic neuron subset of the model that is largely established during pretraining. The parameters associated with these neurons are tightly calibrated during the training process, and their precise values are critical for the capabilities of the model. Building on these insights, we propose a supervised fine-tuning approach that updates only keystone neurons, achieving task gains comparable to or even better than full-parameter fine-tuning while better preserving performance in other capability dimensions, despite modifying a much smaller number of parameters.