Flexible Routing via Uncertainty Decomposition

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses key challenges in model routing—namely, the trade-off between performance and cost, difficulty handling ambiguous queries, and limited adaptability to diverse loss–cost configurations. The authors propose an uncertainty-aware routing mechanism that decomposes total predictive uncertainty into reducible and irreducible components for the first time. Without requiring model retraining, the method dynamically selects actions: invoking a weak model under low uncertainty, a strong model when reducible uncertainty is high, and abstaining when irreducible uncertainty dominates. The framework flexibly adapts to arbitrary cost and loss functions via tunable hyperparameters and provides a theoretical regret bound relative to the optimal task-specific router. Experiments demonstrate that, particularly when reducible and irreducible uncertainties are weakly correlated, the approach significantly outperforms baselines on both synthetic and real-world datasets while effectively balancing system performance and computational cost.

📝 Abstract

A key strategy for balancing performance and cost in modern machine learning systems is to dynamically route queries to either a low-cost model or a more expensive oracle (such as a large pretrained model or human expert), an approach known as model routing. In this work we present a new uncertainty-aware router that (1) avoids unnecessary oracle calls on inherently ambiguous queries, and (2) adapts dynamically to different loss functions and cost parameters through simple hyperparameter changes, without retraining. Our method, applicable to any classification setting where multiple independent annotations per input are available, is based on decomposing total uncertainty into irreducible and reducible components using higher-order predictors [Ahdritz et al., 2025]. This enables a unified approach to both routing and abstention: predict with the weak model when uncertainty is low, route to the oracle when reducible uncertainty is high, and abstain when irreducible uncertainty is high. Our router comes with strong theoretical guarantees bounding regret relative to optimal task-specific routers. We conduct experiments on both synthetic and real-world datasets that demonstrate the benefits of our approach in suitable regimes -- in particular, whenever reducible and irreducible uncertainty are not too correlated.

Problem

Research questions and friction points this paper is trying to address.

model routing

uncertainty decomposition

cost-performance tradeoff

abstention

oracle queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty decomposition

model routing

abstention