Exploiting Adjacent Similarity in Multi-Armed Bandit Tasks via Transfer of Reward Samples

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses sequential multi-task multi-armed bandits, leveraging inter-task adjacency similarity—i.e., bounded differences in mean rewards between adjacent tasks—to enable reward sample transfer and reduce cumulative regret. We propose the first UCB-based cross-task sample transfer mechanism, designing adaptive algorithms for both known and unknown similarity parameters, and provide rigorous upper bounds on total regret, demonstrating that transfer significantly mitigates regret growth. Our method integrates an enhanced UCB strategy, adjacency-aware similarity modeling, boundary-constrained analysis, and reward sample reuse. Theoretical analysis establishes sublinear regret bounds strictly improving upon standard UCB under similarity assumptions. Empirical evaluations confirm consistent superiority over standard UCB and naive transfer baselines across diverse task sequences. Collectively, our results validate the efficacy and advantage of structurally informed, similarity-driven sample transfer in multi-task bandit learning.

Technology Category

Application Category

📝 Abstract
We consider a sequential multi-task problem, where each task is modeled as the stochastic multi-armed bandit with K arms. We assume the bandit tasks are adjacently similar in the sense that the difference between the mean rewards of the arms for any two consecutive tasks is bounded by a parameter. We propose two algorithms (one assumes the parameter is known while the other does not) based on UCB to transfer reward samples from preceding tasks to improve the overall regret across all tasks. Our analysis shows that transferring samples reduces the regret as compared to the case of no transfer. We provide empirical results for our algorithms, which show performance improvement over the standard UCB algorithm without transfer and a naive transfer algorithm.
Problem

Research questions and friction points this paper is trying to address.

Sequential multi-task learning with adjacent similarity
Transfer of reward samples to reduce regret
Comparison of algorithms with and without transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

UCB-based algorithms for multi-armed bandit tasks
Transfer reward samples between adjacent tasks
Improve regret by leveraging adjacent similarity
🔎 Similar Papers
No similar papers found.
N
NR Rahul
Department of Electrical Communication Engineering (ECE) at the Indian Institute of Science, Bengaluru, India
Vaibhav Katewa
Vaibhav Katewa
Robert Bosch Center for Cyber-Physical Systems and the Department of ECE at the Indian Institute of Science, Bengaluru, India