Multi-Task Representation Learning for Conservative Linear Bandits

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of efficiently learning a shared low-dimensional representation under safety constraints in multi-task linear bandits. The authors propose the CMTRL framework, which for the first time integrates conservative bandit algorithms with multi-task low-rank representation learning. Central to this framework is the Safe-AltGDmin algorithm, which combines alternating projected gradient descent with constrained optimization to simultaneously recover a shared r-dimensional low-rank feature matrix and learn task-specific policies. Theoretical analysis provides upper bounds on both regret and sample complexity, while empirical results demonstrate that the proposed method significantly outperforms existing baselines while adhering to conservative safety constraints.

📝 Abstract

This paper presents the Constrained Multi-Task Representation Learning (CMTRL) framework for linear bandits. We consider T linear bandit tasks in a d dimensional space, which share a common low-dimensional representation of dimension r, where r is much smaller than the minimum of d and T. Furthermore, tasks are constrained so that only actions meeting specific safety or performance requirements are allowed, referred to as conservative (safe) bandits. We introduce a novel algorithm, Safe-Alternating projected Gradient Descent and minimization (Safe-AltGDmin), to recover a low-rank feature matrix while satisfying the given constraints. Building on this algorithm, we propose a multi-task representation learning framework for conservative linear bandits and establish theoretical guarantees for its regret and sample complexity bounds. We presented experiments and compared the performance of our algorithm with benchmark algorithms.

Problem

Research questions and friction points this paper is trying to address.

multi-task representation learning

conservative linear bandits

low-dimensional representation

safety constraints

linear bandits

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task representation learning

conservative bandits

low-rank recovery