🤖 AI Summary
This work addresses the limitation of existing approaches that overlook the specialization disparities among large language models (LLMs) across diverse tasks, thereby struggling to balance varied reasoning demands and task complexity. To this end, the authors propose a task-aware LLM committee framework that integrates Monte Carlo Tree Search (MCTS) with a structured archive of successful execution trajectories. The framework dynamically selects the most suitable expert model by semantically matching current task contexts with historical successful paths. Furthermore, it introduces an adaptive dual-signal weighting mechanism that combines real-time model evaluation with historical utility to guide decision-making. Experiments on benchmarks including WebShop, HumanEval, and the 24 Game demonstrate significant improvements in both task success rates and search efficiency, outperforming strong baselines.
📝 Abstract
Large language models (LLMs) have shown strong capabilities across diverse decision-making tasks. However, existing approaches often overlook the specialization differences among available models, treating all LLMs as uniformly applicable regardless of task characteristics. This limits their ability to adapt to varying reasoning demands and task complexities. In this work, we propose Task-Aware LLM Council (TALC), a task-adaptive decision framework that integrates a council of LLMs with Monte Carlo Tree Search (MCTS) to enable dynamic expert selection and efficient multi-step planning. Each LLM is equipped with a structured success memory profile derived from prior task trajectories, enabling semantic matching between current reasoning context and past successes. At each decision point, TALC routes control to the most contextually appropriate model and estimates node value using a dual-signal mechanism that fuses model-based evaluations with historical utility scores. These signals are adaptively weighted based on intra-node variance and used to guide MCTS selection, allowing the system to balance exploration depth with planning confidence. Experiments on WebShop, HumanEval, and the Game of 24 demonstrate that TALC achieves superior task success rates and improved search efficiency compared to strong baselines, validating the benefits of specialization-aware routing and adaptive planning.