🤖 AI Summary
This work addresses the challenge of modeling non-stationary time series when expert availability varies over time and only partial feedback is observed. To this end, the authors propose L2D-SLDS, a factored switching linear Gaussian state-space model that jointly captures shared global dynamics and expert-specific latent states. The approach models expert residuals and enables context-aware dynamic routing, while supporting online expert registration and pruning. A routing policy based on information-directed sampling (IDS) is introduced to balance predictive accuracy against the information gain from querying individual experts. Experimental results demonstrate that the proposed method significantly outperforms contextual multi-armed bandit baselines and ablated variants without shared factors, achieving marked improvements in predictive performance.
📝 Abstract
We study Learning to Defer for non-stationary time series with partial feedback and time-varying expert availability. At each time step, the router selects an available expert, observes the target, and sees only the queried expert's prediction. We model signed expert residuals using L2D-SLDS, a factorized switching linear-Gaussian state-space model with context-dependent regime transitions, a shared global factor enabling cross-expert information transfer, and per-expert idiosyncratic states. The model supports expert entry and pruning via a dynamic registry. Using one-step-ahead predictive beliefs, we propose an IDS-inspired routing rule that trades off predicted cost against information gained about the latent regime and shared factor. Experiments show improvements over contextual-bandit baselines and a no-shared-factor ablation.