🤖 AI Summary
This work addresses the instability of large language models in mathematical reasoning across problems of varying difficulty. To mitigate this issue, the authors propose an adaptive multi-expert reasoning framework that coordinates multiple expert models through a difficulty-aware routing mechanism and an uncertainty-guided dynamic sampling strategy. The framework further enhances robustness by integrating a neural verifier with a clustering-based aggregation scheme. Notably, the method operates solely on original training data—without reliance on synthetic examples—and achieves a 75.28% accuracy on the GSM8K benchmark, outperforming most existing 7B-scale models that depend on augmented or synthetic data. This result underscores the framework’s capacity for efficient and robust mathematical reasoning without requiring additional training data.
📝 Abstract
Large language models (LLMs) demonstrate strong performance in math reasoning benchmarks, but their performance varies inconsistently across problems with varying levels of difficulty. This paper describes Adaptive Multi-Expert Reasoning (AMR), a framework that focuses on problem complexity by reasoning with dynamically adapted strategies. An agile routing system that focuses on problem text predicts problems' difficulty and uncertainty and guides a reconfigurable sampling mechanism to manage the breadth of generation. Three specialized experts create candidate responses, which are modified during multiple correction and finalization phases. A neural verifier assesses the correctness of responses, while a clustering-based aggregation technique identifies the final candidate answer based on a combination of consensus and answer quality. When evaluated on the GSM8K dataset, AMR achieved 75.28% accuracy while only using the original training data. This result outperformed the majority of comparable 7B models that were trained on synthetic data. This showcases that models using difficulty-based routing and uncertainty-driven aggregation are efficient and effective in improving math reasoning models' robustness.