Causal LLM Routing: End-to-End Regret Minimization from Observational Data

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses dynamic routing in multi-LLM systems, aiming to learn optimal routing policies solely from low-cost observational feedback—i.e., outputs from the actually deployed model per query—thereby avoiding reliance on expensive full-feedback supervision and error-prone decoupled “predict-then-select” paradigms. We propose the first causally grounded, end-to-end routing framework, introducing two theoretically justified surrogate objectives: a classification upper bound and a softmax-weighted regret. To capture heterogeneous cost preferences across models, we design an interval-conditional neural architecture. Evaluated on public benchmarks across diverse embedding models, our method achieves state-of-the-art performance, significantly outperforming existing approaches. Results demonstrate both the feasibility and superiority of learning robust, efficient routing policies using only observational data—without ground-truth labels or explicit error modeling.

Technology Category

Application Category

📝 Abstract

LLM routing aims to select the most appropriate model for each query, balancing competing performance metrics such as accuracy and cost across a pool of language models. Prior approaches typically adopt a decoupled strategy, where the metrics are first predicted and the model is then selected based on these estimates. This setup is prone to compounding errors and often relies on full-feedback data, where each query is evaluated by all candidate models, which is costly to obtain and maintain in practice. In contrast, we learn from observational data, which records only the outcome of the model actually deployed. We propose a causal end-to-end framework that learns routing policies by minimizing decision-making regret from observational data. To enable efficient optimization, we introduce two theoretically grounded surrogate objectives: a classification-based upper bound, and a softmax-weighted regret approximation shown to recover the optimal policy at convergence. We further extend our framework to handle heterogeneous cost preferences via an interval-conditioned architecture. Experiments on public benchmarks show that our method outperforms existing baselines, achieving state-of-the-art performance across different embedding models.

Problem

Research questions and friction points this paper is trying to address.

Selecting optimal LLM per query balancing accuracy and cost

Learning routing policies from observational data to minimize regret

Handling heterogeneous cost preferences via interval-conditioned architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal end-to-end framework minimizes decision-making regret

Classification-based and softmax-weighted surrogate objectives

Interval-conditioned architecture handles heterogeneous cost preferences

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI