Online Learning-to-Defer with Varying Experts

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
This work addresses the online learning-to-defer (L2D) problem in dynamic environments characterized by streaming data, time-varying expert availability, and drifting expert competencies. It proposes the first online L2D algorithm capable of handling multiclass classification with bandit feedback and a dynamic pool of experts. By introducing a novel ℋ-consistency bound and integrating it with first-order online convex optimization, the algorithm achieves regret upper bounds of $O((n + n_e)T^{2/3})$ in the general case and $O((n + n_e)\sqrt{T})$ under low-noise conditions, where $n$ and $n_e$ denote the numbers of classes and experts, respectively. Empirical evaluations on both synthetic and real-world datasets demonstrate that the proposed method significantly outperforms static batch L2D baselines, effectively adapting to shifts in expert reliability and non-stationary input distributions.
📝 Abstract
Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of $O((n+n_e)T^{2/3})$ in general and $O((n+n_e)\sqrt{T})$ under a low-noise condition, where $T$ is the time horizon, $n$ is the number of labels, and $n_e$ is the number of distinct experts observed across rounds. The analysis builds on novel $\mathcal{H}$-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.
Problem

Research questions and friction points this paper is trying to address.

Online Learning-to-Defer
Varying Experts
Bandit Feedback
Dynamic Expert Availability
Streaming Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Learning-to-Defer
Bandit Feedback
Dynamic Experts
Regret Bound
ℋ-consistency