MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the challenge of dynamically selecting the optimal large language model (LLM) under a fixed budget in multi-turn, long-horizon tasks, where repeated LLM invocations incur substantial inference costs. The authors propose MTRouter, a novel framework that constructs joint embeddings of historical interactions and candidate models, and learns a utility predictor from past trajectories to enable cost-aware routing decisions across turns. This approach reduces model-switching frequency, enhances robustness to transient errors, and induces emergent specialization among models in the pool. Experiments demonstrate that MTRouter outperforms GPT-5 on ScienceWorld while reducing total cost by 58.7%, achieves comparable accuracy on HLE with a 43.4% cost saving, and generalizes effectively to unseen tasks.

Technology Category

Application Category

📝 Abstract

Multi-turn, long-horizon tasks are increasingly common for large language models (LLMs), but solving them typically requires many sequential model invocations, accumulating substantial inference costs. Here, we study cost-aware multi-turn LLM routing: selecting which model to invoke at each turn from a model pool, given a fixed cost budget. We propose MTRouter, which encodes the interaction history and candidate models into joint history-model embeddings, and learns an outcome estimator from logged trajectories to predict turn-level model utility. Experiments show that MTRouter improves the performance-cost trade-off: on ScienceWorld, it surpasses GPT-5 while reducing total cost by 58.7%; on Humanity's Last Exam (HLE), it achieves competitive accuracy while reducing total cost by 43.4% relative to GPT-5, and these gains even carry over to held-out tasks. Further analyses reveal several mechanisms underlying its effectiveness: relative to prior multi-turn routers, MTRouter makes fewer model switches, is more tolerant to transient errors, and exhibits emergent specialization across models. Code: https://github.com/ZhangYiqun018/MTRouter

Problem

Research questions and friction points this paper is trying to address.

multi-turn LLM routing

cost-aware

model selection

inference cost

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

cost-aware routing

multi-turn LLM

history-model joint embeddings