Identifying and Transferring Reasoning-Critical Neurons: Improving LLM Inference Reliability via Activation Steering

📅 2026-01-27
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the suboptimal performance of large language models on complex reasoning tasks, often attributed to unreliable internal activations. While existing approaches rely on costly post-training or extensive sampling, this paper introduces AdaRAS—a lightweight and efficient framework that identifies and leverages “reasoning-critical neurons” (RCNs), whose activations are highly correlated with reasoning correctness. AdaRAS adaptively modulates these neurons’ activations at test time using a polarity-aware mean-shift criterion, without requiring any additional training or sampling. The method demonstrates strong cross-model and cross-dataset transferability, achieving significant performance gains across ten mathematical and programming benchmarks. Notably, it improves scores on AIME-24 and AIME-25 by over 13%, outperforming conventional post-training strategies.

Technology Category

Application Category

📝 Abstract
Despite the strong reasoning capabilities of recent large language models (LLMs), achieving reliable performance on challenging tasks often requires post-training or computationally expensive sampling strategies, limiting their practical efficiency. In this work, we first show that a small subset of neurons in LLMs exhibits strong predictive correlations with reasoning correctness. Based on this observation, we propose AdaRAS (Adaptive Reasoning Activation Steering), a lightweight test-time framework that improves reasoning reliability by selectively intervening on neuron activations. AdaRAS identifies Reasoning-Critical Neurons (RCNs) via a polarity-aware mean-difference criterion and adaptively steers their activations during inference, enhancing incorrect reasoning traces while avoiding degradation on already-correct cases. Experiments on 10 mathematics and coding benchmarks demonstrate consistent improvements, including over 13% gains on AIME-24 and AIME-25. Moreover, AdaRAS exhibits strong transferability across datasets and scalability to stronger models, outperforming post-training methods without additional training or sampling cost.
Problem

Research questions and friction points this paper is trying to address.

reasoning reliability
large language models
neuron activation
inference efficiency
reasoning correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-Critical Neurons
Activation Steering
AdaRAS
Test-Time Intervention
LLM Reliability
🔎 Similar Papers
2024-06-20arXiv.orgCitations: 26