Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints

📅 2024-01-21
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies conservative multi-task learning for distributed heterogeneous agents in stochastic linear contextual bandits: each agent knows only the context distribution—not the instantaneous context—and must satisfy per-stage performance constraints. To this end, we first extend conservative linear bandits to the distributed context-distribution setting, proposing DiSC-UCB—a unified framework integrating distribution-aware UCB, dynamic action-set pruning, and structured server-coordinated synchronization. We further design DiSC-UCB2, an adaptive extension that eliminates dependence on baseline rewards. Theoretically, both algorithms achieve near-optimal regret bounds of $ ilde{O}(sqrt{T})$ and low communication complexity of $O(log T)$. Experiments on synthetic data and MovieLens-100K empirically validate strict satisfaction of performance constraints and demonstrate synergistic multi-task gains.

Technology Category

Application Category

📝 Abstract
We present conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. We prove the regret and communication bounds on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. For this setting, we provide a modified algorithm, DiSC-UCB2, and we show that the modified algorithm achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.
Problem

Research questions and friction points this paper is trying to address.

Distributed multi-task learning with context distribution constraints
Heterogeneous agents adhering to stage-wise performance constraints
Unknown exact context, only context distribution available
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed UCB algorithm for bandits
Pruned action set ensures constraints
Synchronized sharing via central server
🔎 Similar Papers
No similar papers found.
Jiabin Lin
Jiabin Lin
Iowa State University
machine learningbandit problemdistributed systems
S
Shana Moothedath
Department of Electrical Engineering, Iowa State University