Adaptive Sample Sharing for Multi Agent Linear Bandits

📅 2023-09-15

📈 Citations: 1

✨ Influential: 0

career value

293K/year

🤖 AI Summary

This paper addresses collaborative regret minimization in multi-agent linear bandits without parametric structural assumptions, focusing on how data sharing influences joint learning efficiency. We propose the BASS algorithm: the first to formally characterize the trade-off between estimation bias and uncertainty under no structural assumptions; it employs adaptive sampling and clustering-aware parameter recovery to automatically identify latent clustering structures among agent parameters. Theoretically, BASS achieves tighter cumulative regret bounds than existing state-of-the-art methods. Empirically, it significantly reduces regret on both synthetic and real-world datasets while accurately recovering underlying parameter clusters. Its core innovation lies in eliminating reliance on restrictive model structures—such as sparsity or low-rankness—thereby enabling more generalizable and robust distributed linear bandit learning.

📝 Abstract

The multi-agent linear bandit setting is a well-known setting for which designing efficient collaboration between agents remains challenging. This paper studies the impact of data sharing among agents on regret minimization. Unlike most existing approaches, our contribution does not rely on any assumptions on the bandit parameters structure. Our main result formalizes the trade-off between the bias and uncertainty of the bandit parameter estimation for efficient collaboration. This result is the cornerstone of the Bandit Adaptive Sample Sharing (BASS) algorithm, whose efficiency over the current state-of-the-art is validated through both theoretical analysis and empirical evaluations on both synthetic and real-world datasets. Furthermore, we demonstrate that, when agents' parameters display a cluster structure, our algorithm accurately recovers them.

Problem

Research questions and friction points this paper is trying to address.

Impact of data sharing on multi-agent regret minimization

Bias-uncertainty trade-off in bandit parameter estimation

Cluster structure recovery in agent parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive sample sharing without parameter assumptions

Balances bias and uncertainty for collaboration

Cluster structure recovery in multi-agent settings

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits