Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing test-time scaling (TTS) methods predominantly focus on output-layer sampling, overlooking the inherent architectural adaptability of foundation models. Method: We propose Dynamic Expert Search (DES), the first TTS framework for Mixture-of-Experts (MoE) large language models that treats the number of activated experts as a learnable, inference-time search dimension. DES integrates dynamic MoE control with expert configuration inheritance, enabling architecture-aware, diverse, and stable multi-path inference—without increasing computational overhead. It requires no model modification or retraining; instead, it dynamically adjusts expert activation counts during inference and jointly validates candidate trajectories via a lightweight verifier. Contribution/Results: On mathematical reasoning, code generation, and knowledge-intensive benchmarks, DES consistently outperforms state-of-the-art TTS methods in both accuracy and inference robustness. It exhibits strong generalizability and plug-and-play compatibility across diverse MoE architectures.

Technology Category

Application Category

📝 Abstract

Test-Time Scaling (TTS) enhances the reasoning ability of large language models (LLMs) by allocating additional computation during inference. However, existing approaches primarily rely on output-level sampling while overlooking the role of model architecture. In mainstream Mixture-of-Experts (MoE) LLMs, we observe that varying the number of activated experts yields complementary solution sets with stable accuracy, revealing a new and underexplored source of diversity. Motivated by this observation, we propose Dynamic Experts Search (DES), a TTS strategy that elevates expert activation into a controllable dimension of the search space. DES integrates two key components: (1) Dynamic MoE, which enables direct control of expert counts during inference to generate diverse reasoning trajectories without additional cost; and (2) Expert Configuration Inheritance, which preserves consistent expert counts within a reasoning path while varying them across runs, thereby balancing stability and diversity throughout the search. Extensive experiments across MoE architectures, verifiers and reasoning benchmarks (i.e., math, code and knowledge) demonstrate that DES reliably outperforms TTS baselines, enhancing accuracy and stability without additional cost. These results highlight DES as a practical and scalable form of architecture-aware TTS, illustrating how structural flexibility in modern LLMs can advance reasoning.

Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning in Mixture-of-Experts LLMs

Controlling expert activation for diverse reasoning trajectories

Improving accuracy and stability without additional cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic MoE enables direct control of expert counts

Expert Configuration Inheritance balances stability and diversity

Architecture-aware TTS enhances reasoning without additional cost

🔎 Similar Papers

No similar papers found.