🤖 AI Summary
This work proposes SCOPE, a novel model routing framework that overcomes the limitations of existing methods, which struggle to generalize to new models or adapt to dynamic computational budgets and are restricted to fixed model sets. Instead of relying on static selection based on model identifiers, SCOPE retrieves historical model behaviors from similar queries to pre-estimate each candidate model’s accuracy and cost on the current task. By integrating a reinforcement learning–trained predictor with a retrieval-based pre-inference mechanism, SCOPE achieves strong generalization to unseen models and explicitly balances accuracy against computational cost. Experimental results demonstrate that SCOPE substantially outperforms prior routing approaches, improving accuracy by up to 25.7% under performance-oriented settings and reducing computational cost by up to 95.1% in efficiency-focused scenarios.
📝 Abstract
Model routing chooses which language model to use for each query. By sending easy queries to cheaper models and hard queries to stronger ones, it can significantly reduce inference cost while maintaining high accuracy. However, most existing routers treat this as a fixed choice among a small set of models, which makes them hard to adapt to new models or changing budget constraints. In this paper, we propose SCOPE (Scalable and Controllable Outcome Performance Estimator), a routing framework that goes beyond model selection by predicting their cost and performance. Trained with reinforcement learning, SCOPE makes reasoning-based predictions by retrieving how models behave on similar problems, rather than relying on fixed model names, enabling it to work with new, unseen models. Moreover, by explicitly predicting how accurate and how expensive a model will be, it turns routing into a dynamic decision problem, allowing users to easily control the trade-off between accuracy and cost. Experiments show that SCOPE is more than just a cost-saving tool. It flexibly adapts to user needs: it can boost accuracy by up to 25.7% when performance is the priority, or cut costs by up to 95.1% when efficiency matters most.