NeuReasoner: Towards Explainable, Controllable, and Unified Reasoning via Mixture-of-Neurons

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the frequent failure of large reasoning models in complex tasks due to intra-step computational errors, inter-step oscillations, or overthinking, compounded by the absence of a unified and interpretable correction mechanism. Through white-box analysis, the study identifies critical neurons and their activation patterns associated with distinct failure modes and introduces a unified self-correction framework based on Mixture-of-Neurons (MoN). This approach employs a lightweight MLP to detect failures and triggers targeted corrections via special tokens—without requiring reinforcement learning. It achieves the first unified modeling of multi-level reasoning failures, outperforming nine baselines across six benchmarks and six backbone models (8B–70B), with performance gains up to 27.0% and token consumption reductions ranging from 19.6% to 63.3%.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) have recently achieved remarkable success in complex reasoning tasks. However, closer scrutiny reveals persistent failure modes compromising performance and cost: I) Intra-step level, marked by calculation or derivation errors; II) Inter-step level, involving oscillation and stagnation; and III) Instance level, causing maladaptive over-thinking. Existing endeavors target isolated levels without unification, while their black-box nature and reliance on RL hinder explainability and controllability. To bridge these gaps, we conduct an in-depth white-box analysis, identifying key neurons (Mixture of Neurons, MoN) and their fluctuation patterns associated with distinct failures. Building upon these insights, we propose NeuReasoner, an explainable, controllable, and unified reasoning framework driven by MoN. Technically, NeuReasoner integrates lightweight MLPs for failure detection with a special token-triggered self-correction mechanism learned via SFT. During inference, special tokens are inserted upon failure detection to actuate controllable remedial behaviors. Extensive evaluations across six benchmarks, six backbone models (8B~70B) against nine competitive baselines, demonstrate that NeuReasoner achieves performance gains of up to 27.0% while reducing token consumption by 19.6% ~ 63.3%.

Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Models

failure modes

explainability

controllability

reasoning errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Neurons

Explainable Reasoning

Controllable Inference