Beyond Task Diversity: Provable Representation Transfer for Sequential Multi-Task Linear Bandits

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This paper studies lifelong learning for sequential multi-task linear bandits under low-rank structure, addressing realistic settings where tasks are numerous, lack diversity, and share parameters in a low-dimensional subspace. We propose the first provably sound low-rank representation transfer theory that does not require the uniform task coverage assumption. Our method introduces a sequential multi-task algorithm with rigorous regret upper bounds, leveraging low-rank modeling, confidence ellipsoid estimation over ellipsoidal action sets, progressive representation updates, and cross-task parameter sharing to enable efficient representation learning and transfer. The theoretical regret bound is $ ilde{O}(Nmsqrt{ au} + N^{2/3} au^{2/3}dm^{1/3} + Nd^2 + au md)$, which significantly improves upon non-low-rank baselines. Synthetic experiments empirically validate the superiority of our approach.

Technology Category

Application Category

📝 Abstract

We study lifelong learning in linear bandits, where a learner interacts with a sequence of linear bandit tasks whose parameters lie in an $m$-dimensional subspace of $mathbb{R}^d$, thereby sharing a low-rank representation. Current literature typically assumes that the tasks are diverse, i.e., their parameters uniformly span the $m$-dimensional subspace. This assumption allows the low-rank representation to be learned before all tasks are revealed, which can be unrealistic in real-world applications. In this work, we present the first nontrivial result for sequential multi-task linear bandits without the task diversity assumption. We develop an algorithm that efficiently learns and transfers low-rank representations. When facing $N$ tasks, each played over $ au$ rounds, our algorithm achieves a regret guarantee of $ ilde{O}ig (Nm sqrt{ au} + N^{frac{2}{3}} au^{frac{2}{3}} d m^{frac13} + Nd^2 + au m d ig)$ under the ellipsoid action set assumption. This result can significantly improve upon the baseline of $ ilde{O} left (Nd sqrt{ au} ight)$ that does not leverage the low-rank structure when the number of tasks $N$ is sufficiently large and $m ll d$. We also demonstrate empirically on synthetic data that our algorithm outperforms baseline algorithms, which rely on the task diversity assumption.

Problem

Research questions and friction points this paper is trying to address.

Linear Tasks

Pattern Sharing

Learning Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Verifiable Representation Transfer

Multitask Linear Bandit Problems

Small Shared Representation Space

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits