๐ค AI Summary
In multi-task linear contextual bandits, high-dimensional contexts incur substantial sample and computational overhead.
Method: This paper investigates efficient learning under a shared low-dimensional linear representation. We propose a unified framework combining alternating projected gradient descent with a minimization estimator, enabling the first theoretically guaranteed recovery of low-rank feature matrices under stochastic context assumptions.
Contribution/Results: Our analysis establishes rigorous regret convergence guarantees for multi-task learning, with an upper bound that strictly improves upon single-task baselines. Experiments demonstrate 3โ5ร higher sample efficiency and significantly accelerated convergence across multiple tasks. The core innovation lies in tightly coupling low-rank structural priors with online decision dynamicsโachieving both statistical efficiency and computational tractability without compromising theoretical soundness.
๐ Abstract
We study how representation learning can improve the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d simultaneously, and these T bandit tasks collectively share a common linear representation with a dimensionality of r much smaller than d. We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the proposed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms.