๐ค AI Summary
This work addresses the frequent failure of large reasoning models in complex tasks due to intra-step computational errors, inter-step oscillations, or overthinking, compounded by the absence of a unified and interpretable correction mechanism. Through white-box analysis, the study identifies critical neurons and their activation patterns associated with distinct failure modes and introduces a unified self-correction framework based on Mixture-of-Neurons (MoN). This approach employs a lightweight MLP to detect failures and triggers targeted corrections via special tokensโwithout requiring reinforcement learning. It achieves the first unified modeling of multi-level reasoning failures, outperforming nine baselines across six benchmarks and six backbone models (8Bโ70B), with performance gains up to 27.0% and token consumption reductions ranging from 19.6% to 63.3%.
๐ Abstract
Large Reasoning Models (LRMs) have recently achieved remarkable success in complex reasoning tasks. However, closer scrutiny reveals persistent failure modes compromising performance and cost: I) Intra-step level, marked by calculation or derivation errors; II) Inter-step level, involving oscillation and stagnation; and III) Instance level, causing maladaptive over-thinking. Existing endeavors target isolated levels without unification, while their black-box nature and reliance on RL hinder explainability and controllability. To bridge these gaps, we conduct an in-depth white-box analysis, identifying key neurons (Mixture of Neurons, MoN) and their fluctuation patterns associated with distinct failures. Building upon these insights, we propose NeuReasoner, an explainable, controllable, and unified reasoning framework driven by MoN. Technically, NeuReasoner integrates lightweight MLPs for failure detection with a special token-triggered self-correction mechanism learned via SFT. During inference, special tokens are inserted upon failure detection to actuate controllable remedial behaviors. Extensive evaluations across six benchmarks, six backbone models (8B~70B) against nine competitive baselines, demonstrate that NeuReasoner achieves performance gains of up to 27.0% while reducing token consumption by 19.6% ~ 63.3%.