Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This paper studies adversarial (non-stochastic) linear partial monitoring with a finite action set, infinite outcome space, and decoupled loss and feedback functions. For both locally and globally observable game structures, we propose an efficient optimization-based exploration algorithm. Our main contribution is the first instance-dependent regret bound that explicitly captures the alignment between observations and losses. The bound tightly depends on the game structure: it achieves the optimal √T rate in the easy (globally observable) regime and the T^{2/3} rate in the hard (only locally observable) regime—matching known bounds from the stochastic setting. Crucially, our analysis establishes tightness of the bound under typical conditions, thereby significantly enhancing the structural insight and transparency of regret analysis in partial monitoring.

Technology Category

Application Category

📝 Abstract

In contrast to the classic formulation of partial monitoring, linear partial monitoring can model infinite outcome spaces, while imposing a linear structure on both the losses and the observations. This setting can be viewed as a generalization of linear bandits where loss and feedback are decoupled in a flexible manner. In this work, we address a nonstochastic (adversarial), finite-actions version of the problem through a simple instance of the exploration-by-optimization method that is amenable to efficient implementation. We derive regret bounds that depend on the game structure in a more transparent manner than previous theoretical guarantees for this paradigm. Our bounds feature instance-specific quantities that reflect the degree of alignment between observations and losses, and resemble known guarantees in the stochastic setting. Notably, they achieve the standard $sqrt{T}$ rate in easy (locally observable) games and $T^{2/3}$ in hard (globally observable) games, where $T$ is the time horizon. We instantiate these bounds in a selection of old and new partial information settings subsumed by this model, and illustrate that the achieved dependence on the game structure can be tight in interesting cases.

Problem

Research questions and friction points this paper is trying to address.

Generalizes linear bandits with decoupled loss and feedback mechanisms

Addresses nonstochastic adversarial linear partial monitoring with finite actions

Derives instance-dependent regret bounds reflecting observation-loss alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploration-by-optimization method for adversarial linear partial monitoring

Regret bounds with instance-specific alignment between observations and losses

Achieves √T rate in easy games and T^{2/3} in hard games

🔎 Similar Papers

Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring