Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This paper studies the **redundancy identification problem** for items under nontransitive preference relations in the dueling bandits framework—i.e., determining whether all $n$ items can possibly be selected by a rational player (i.e., no item is redundant). For canonical nontransitive structures such as rock-paper-scissors, we first formalize redundancy and derive theoretical criteria: an item is non-redundant iff the homogeneous linear system induced by the payoff matrix $A$ admits a nonzero nonnegative solution; we further establish a tight necessary and sufficient condition based on $det(A)$. Our theory reveals that any nontransitive tournament structure with $n geq 4$ must contain at least one redundant item. Consequently, we characterize the optimal sample complexity of the problem as $Theta(n^2/Delta^2)$. Methodologically, we integrate tools from game theory (Nash equilibrium analysis), linear algebra, and statistical learning theory. Empirical validation on Jan-ken variants confirms the tightness of our bounds.

Technology Category

Application Category

📝 Abstract

Dueling bandit is a variant of the Multi-armed bandit to learn the binary relation by comparisons. Most work on the dueling bandit has targeted transitive relations, that is, totally/partially ordered sets, or assumed at least the existence of a champion such as Condorcet winner and Copeland winner. This work develops an analysis of dueling bandits for non-transitive relations. Jan-ken (a.k.a. rock-paper-scissors) is a typical example of a non-transitive relation. It is known that a rational player chooses one of three items uniformly at random, which is known to be Nash equilibrium in game theory. Interestingly, any variant of Jan-ken with four items (e.g., rock, paper, scissors, and well) contains at least one useless item, which is never selected by a rational player. This work investigates a dueling bandit problem to identify whether all $n$ items are indispensable in a given win-lose relation. Then, we provide upper and lower bounds of the sample complexity of the identification problem in terms of the determinant of $A$ and a solution of $mathbf{x}^{ op} A = mathbf{0}^{ op}$ where $A$ is an $n imes n$ pay-off matrix that every duel follows.

Problem

Research questions and friction points this paper is trying to address.

Analyzing dueling bandits for non-transitive relations

Identifying indispensable items in win-lose relations

Providing sample complexity bounds for identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes dueling bandits for non-transitive relations

Identifies indispensable items in win-lose relations

Bounds sample complexity using pay-off matrix properties

🔎 Similar Papers

Multi-Player Approaches for Dueling Bandits