On Characterizing Learnability for Adversarial Noisy Bandits

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work investigates the learnability of a function class ℱ in adversarial noisy bandit settings, specifically whether any algorithm can achieve sublinear regret. To this end, it introduces the “convexified generalized minimax volume” as a central complexity measure and defines the “distribution covering number” to characterize its connection to efficient learning algorithms. Leveraging tools from convex analysis, hitting set theory, and the multiplicative weights method within the adversarial online learning framework, the paper establishes necessary and sufficient conditions for learnability against both oblivious and adaptive adversaries. A complete characterization is provided for countable action spaces, while for uncountable settings, it formulates a key conjecture and highlights important open problems.

📝 Abstract

We study adversarial noisy bandits given a known function class $\mathcal{F}$. In each round, the adversary selects a function $f \in \mathcal{F}$, the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function $f$. The goal is to minimize the cumulative regret $R(T)$, defined as the difference between the learner's performance and that of the best fixed arm in hindsight over $T$ rounds. We say that a function class $\mathcal{F}$ is learnable if there exists an algorithm achieving sublinear regret. Our main results concern characterizing learnability. The main quantity appearing in our characterization is a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025). For oblivious adversaries, we characterize learnability in terms of this convexified generalized maximin volume. For adaptive adversaries, we show that the same quantity characterizes learnability when the arm space is countable. Our analysis builds on a connection between convexified generalized maximin volume and the existence of simple hitting sets. We further conjecture that the same quantity also characterizes learnability when the arm space is uncountable, via its relation to a new complexity measure, which we call the distribution covering number. This notion can be viewed as a strengthened form of the hitting set that still admits efficient learning via the multiplicative weights algorithm. We also pose a number of relevant open questions regarding this problem.

Problem

Research questions and friction points this paper is trying to address.

adversarial noisy bandits

learnability

cumulative regret

function class

sublinear regret

Innovation

Methods, ideas, or system contributions that make the work stand out.

convexified generalized maximin volume

adversarial noisy bandits

learnability characterization