Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints

📅 2024-01-21

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This paper studies conservative multi-task learning for distributed heterogeneous agents in stochastic linear contextual bandits: each agent knows only the context distribution—not the instantaneous context—and must satisfy per-stage performance constraints. To this end, we first extend conservative linear bandits to the distributed context-distribution setting, proposing DiSC-UCB—a unified framework integrating distribution-aware UCB, dynamic action-set pruning, and structured server-coordinated synchronization. We further design DiSC-UCB2, an adaptive extension that eliminates dependence on baseline rewards. Theoretically, both algorithms achieve near-optimal regret bounds of $ ilde{O}(sqrt{T})$ and low communication complexity of $O(log T)$. Experiments on synthetic data and MovieLens-100K empirically validate strict satisfaction of performance constraints and demonstrate synergistic multi-task gains.

Technology Category

Application Category

📝 Abstract

We present conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. We prove the regret and communication bounds on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. For this setting, we provide a modified algorithm, DiSC-UCB2, and we show that the modified algorithm achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.

Problem

Research questions and friction points this paper is trying to address.

Distributed multi-task learning with context distribution constraints

Heterogeneous agents adhering to stage-wise performance constraints

Unknown exact context, only context distribution available

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed UCB algorithm for bandits

Pruned action set ensures constraints

Synchronized sharing via central server

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits