NEX: Neuron Explore-Exploit Scoring for Label-Free Chain-of-Thought Selection and Model Ranking

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the selection bottleneck faced by large language models in multi-path reasoning or checkpoint merging, where supervision signals from the target distribution are absent. The authors propose the first white-box, unsupervised exploration-exploitation (E-X) scoring framework grounded in dynamic neuron activation patterns. By modeling MLP neuron activations via a two-state hidden Markov model and leveraging sparse activation caching, the method integrates neuron reuse analysis with a Good-Mass Fraction metric to construct a label-free quality scoring mechanism. Requiring only minimal unlabeled data, this approach accurately predicts downstream performance, ranks model variants across multiple reasoning benchmarks and the Qwen3 merged model family, and demonstrates strong validity and interpretability through human evaluation and causal transfer experiments.

Technology Category

Application Category

📝 Abstract
Large language models increasingly spend inference compute sampling multiple chain-of-thought traces or searching over merged checkpoints. This shifts the bottleneck from generation to selection, often without supervision on the target distribution. We show entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking. We propose NEX, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation). NEX detects E-phase as spikes in newly activated MLP neurons per token from sparse activation caches, then uses a sticky two-state HMM to infer E-X phases and credits E-introduced neurons by whether they are reused in the following X span. These signals yield interpretable neuron weights and a single Good-Mass Fraction score to rank candidate responses and merged variants without task answers. Across reasoning benchmarks and Qwen3 merge families, NEX computed on a small unlabeled activation set predicts downstream accuracy and identifies better variants; we further validate the E-X signal with human annotations and provide causal evidence via"Effective-vs-Redundant"neuron transfer.
Problem

Research questions and friction points this paper is trying to address.

chain-of-thought selection
model ranking
label-free evaluation
reasoning efficiency
unsupervised scoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explore-Exploit Scoring
Label-Free Model Selection
Chain-of-Thought Ranking
Sparse Neuron Activation
White-Box Interpretability
🔎 Similar Papers
No similar papers found.
K
Kang Chen
Fudan University
Z
Zhuoka Feng
Fudan University
S
Sihan Zhao
Fudan University
K
Kai Xiong
Fudan University
J
Junjie Nian
Fudan University
Y
Yaoning Wang
Fudan University
C
Changyi Xiao
Fudan University
Yixin Cao
Yixin Cao
Fudan University
Natural Language ProcessingKnowledge EngineeringMulti-modal data processing