Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work addresses the challenge of reward signals exhibiting both low-rank structure and non-stationary subspace drift in applications such as recommendation, clinical dosing, and ad targeting. We study a piecewise-stationary low-rank linear contextual bandit problem and propose a novel algorithm that integrates isotropic exploration, windowed projected ridge UCB, and CUSUM-based change-point detection to simultaneously track unknown switching boundaries and evolving subspaces. We establish, for the first time, identifiability conditions for moving subspaces and derive a dynamic regret bound that depends only on the intrinsic rank \( r \), rather than the ambient dimension \( d \), thereby overcoming the curse of dimensionality inherent in conventional approaches. Empirical evaluation across 11 synthetic and real-world benchmarks—including clinical and production logs—demonstrates significant performance gains over existing methods when \( d - r \gtrsim T^{1/6} \), corroborating both theoretical and practical advantages.
📝 Abstract
Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $θ_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.
Problem

Research questions and friction points this paper is trying to address.

low-rank bandits
non-stationarity
subspace drift
dynamic regret
contextual bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

low-rank bandits
non-stationary subspace
dynamic regret
identification boundary
contextual bandits
🔎 Similar Papers
2024-10-02International Conference on Machine LearningCitations: 1
2024-02-05arXiv.orgCitations: 1