Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of simultaneously achieving optimal static and dynamic regret bounds under deterministic losses in adversarial multi-armed bandits—a goal unattained by existing algorithms. Focusing on the setting with non-adaptive adversaries and deterministic losses, the paper proposes a novel algorithm that, for the first time, attains minimax-optimal bounds for both regret notions concurrently. The approach integrates a negative static regret mechanism with Blackwell’s approachability theory, enabling effective model selection by compensating for exploration costs. Moreover, the analysis reveals a fundamental distinction between adaptive and non-adaptive adversaries in the context of multi-benchmark regret: simultaneous optimality is provably unattainable against adaptive adversaries but achievable against non-adaptive ones, offering new insights into the problem of switching reference benchmarks.

Technology Category

Application Category

📝 Abstract
In adversarial multi-armed bandits, two performance measures are commonly used: static regret, which compares the learner to the best fixed arm, and dynamic regret, which compares it to the best sequence of arms. While optimal algorithms are known for each measure individually, there is no known algorithm achieving optimal bounds for both simultaneously. Marinov and Zimmert [2021] first showed that such simultaneous optimality is impossible against an adaptive adversary. Our work takes a first step to demonstrate its possibility against an oblivious adversary when losses are deterministic. First, we extend the impossibility result of Marinov and Zimmert [2021] to the case of deterministic losses. Then, we present an algorithm achieving optimal static and dynamic regret simultaneously against an oblivious adversary. Together, they reveal a fundamental separation between adaptive and oblivious adversaries when multiple regret benchmarks are considered simultaneously. It also provides new insight into the long open problem of simultaneously achieving optimal regret against switching benchmarks of different numbers of switches. Our algorithm uses negative static regret to compensate for the exploration overhead incurred when controlling dynamic regret, and leverages Blackwell approachability to jointly control both regrets. This yields a new model selection procedure for bandits that may be of independent interest.
Problem

Research questions and friction points this paper is trying to address.

bandits
static regret
dynamic regret
oblivious adversary
deterministic losses
Innovation

Methods, ideas, or system contributions that make the work stand out.

simultaneous regret optimality
oblivious adversary
deterministic losses
Blackwell approachability
model selection
🔎 Similar Papers
No similar papers found.