Multi-Player Approaches for Dueling Bandits

📅 2024-05-25

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

240K/year

🤖 AI Summary

This work addresses the challenge of collaborative exploration of non-informative arm pairs in multi-player preference-feedback dueling bandits. Methodologically, it introduces the first theoretically optimal distributed solution: (i) it adapts the Follow-Your-Leader black-box framework to the multi-player dueling setting, achieving the fundamental regret lower bound; and (ii) it designs a distributed protocol leveraging message passing and Condorcet winner recommendation to enable efficient coordination. The approach integrates techniques from multi-agent reinforcement learning, Double Thompson Sampling, and distributed decision-making. Empirical evaluation across multiple benchmark tasks demonstrates that the proposed algorithm significantly outperforms single-player baselines—achieving 37%–62% faster convergence and reducing cumulative regret by 41%–58%. These results validate its dual advantage: rigorous theoretical guarantees and superior practical performance.

Technology Category

Application Category

📝 Abstract

Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting when utilizing known dueling bandit algorithms as a foundation. Additionally, we analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol, resulting in expedited exploration in many cases. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandit setting.

Problem

Research questions and friction points this paper is trying to address.

Addressing multiplayer dueling bandits with preference-based feedback

Optimizing collaborative exploration of non-informative arm pairs

Enhancing distributed algorithms for Condorcet-winner recommendation protocols

Innovation

Methods, ideas, or system contributions that make the work stand out.

Follow Your Leader black-box approach

message-passing fully distributed approach

Condorcet-winner recommendation protocol

🔎 Similar Papers

No similar papers found.