Slm-mux: Orchestrating small language models for reasoning

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address the weak collaborative reasoning capability of small language models (SLMs) and their difficulty in surpassing large language models (LLMs), this paper proposes SLM-MUX: a three-stage collaborative framework comprising complementary model selection, test-time dynamic expansion tailored for SLMs, and task decomposition–result aggregation. Distinct from prior work, SLM-MUX is the first to design collaboration strategies explicitly accounting for SLMs’ inherent capacity constraints and inference cost sensitivity, and provides theoretical convergence analysis. Evaluated on MATH, GPQA, and GSM8K, SLM-MUX achieves accuracy improvements of 13.4%, 8.8%, and 7.0%, respectively. Notably, using only two SLMs, it matches the performance of Qwen2.5-72B on MATH while substantially reducing computational overhead and deployment costs.

Technology Category

Application Category

📝 Abstract

With the rapid development of language models, the number of small language models (SLMs) has grown significantly. Although they do not achieve state-of-the-art accuracy, they are more efficient and often excel at specific tasks. This raises a natural question: can multiple SLMs be orchestrated into a system where each contributes effectively, achieving higher accuracy than any individual model? Existing orchestration methods have primarily targeted frontier models (e.g., GPT-4) and perform suboptimally when applied to SLMs. To address this gap, we propose a three-stage approach for orchestrating SLMs. First, we introduce SLM-MUX, a multi-model architecture that effectively coordinates multiple SLMs. Building on this, we develop two optimization strategies: (i) a model selection search that identifies the most complementary SLMs from a given pool, and (ii) test-time scaling tailored to SLM-MUX. Our approach delivers strong results: Compared to existing orchestration methods, our approach achieves up to 13.4% improvement on MATH, 8.8% on GPQA, and 7.0% on GSM8K. With just two SLMS, SLM-MUX outperforms Qwen 2.5 72B on GPQA and GSM8K, and matches its performance on MATH. We further provide theoretical analyses to substantiate the advantages of our method. In summary, we demonstrate that SLMs can be effectively orchestrated into more accurate and efficient systems through the proposed approach.

Problem

Research questions and friction points this paper is trying to address.

Orchestrating multiple small language models for improved reasoning accuracy

Addressing suboptimal performance of existing orchestration methods with SLMs

Developing efficient model selection and scaling strategies for SLM coordination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-model architecture coordinates multiple small language models

Model selection search identifies complementary SLMs from pool

Test-time scaling strategy tailored for SLM orchestration

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting