Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the asymptotic behavior of greedy (exploitation-only) policies in structured multi-armed bandits under known reward structures, aiming to establish necessary and sufficient conditions for achieving sublinear regret (success) versus linear regret (failure). Method: We introduce *bias identifiability* as a fundamental criterion and develop a unified theoretical framework integrating structured bandit theory, asymptotic statistics, and information-theoretic identifiability analysis. The analysis extends to contextual bandits and general feedback mechanisms. Contribution/Results: We provide the first complete, universal characterization of greedy policy performance for arbitrary finite reward structures. Under non-degeneracy conditions, all greedy policies satisfying bias identifiability achieve sublinear regret. Our criterion is rigorously validated across multiple canonical structured bandit settings, demonstrating both accuracy and robustness. This work establishes the first general-purpose, structure-agnostic identifiability condition for greedy learning in sequential decision-making.

Technology Category

Application Category

📝 Abstract
We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific ones. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret as a function of time. Our characterization identifies a partial identifiability property of the problem instance as the necessary and sufficient condition for the asymptotic success. Notably, once this property holds, the problem becomes easy -- any algorithm will succeed (in the same sense as above), provided it satisfies a mild non-degeneracy condition. We further extend our characterization to contextual bandits and interactive decision-making with arbitrary feedback, and demonstrate its broad applicability across various examples.
Problem

Research questions and friction points this paper is trying to address.

Characterizes greedy algorithm success in structured bandits.
Identifies partial identifiability as key condition for success.
Extends findings to contextual bandits and interactive decision-making.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Greedy algorithm for structured bandit problems
Characterizes asymptotic success via partial identifiability
Extends to contextual bandits and interactive decision-making