Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the online shortest path problem on directed acyclic graphs (DAGs) against an adaptive adversary: in each round, the learner selects a source-to-sink path and observes only the total loss (bandit feedback), aiming to minimize regret relative to the best fixed path over $T$ rounds. To tackle this strongly adversarial setting, we propose the first computationally efficient algorithm, whose core innovations are a novel edge-loss estimator and a high-probability analysis framework based on centroid decomposition—overcoming long-standing analytical bottlenecks for bandit feedback under adaptivity. Theoretically, our algorithm achieves a high-probability regret bound of $ ilde{O}(sqrt{|E| T log |X|})$, which is nearly minimax optimal. Moreover, our approach unifies and improves algorithmic guarantees for several fundamental combinatorial decision problems, including $m$-sets, extensive-form games, the Colonel Blotto game, and hypercube decision spaces.

Technology Category

Application Category

📝 Abstract
In this paper, we study the online shortest path problem in directed acyclic graphs (DAGs) under bandit feedback against an adaptive adversary. Given a DAG $G = (V, E)$ with a source node $v_{mathsf{s}}$ and a sink node $v_{mathsf{t}}$, let $X subseteq {0,1}^{|E|}$ denote the set of all paths from $v_{mathsf{s}}$ to $v_{mathsf{t}}$. At each round $t$, we select a path $mathbf{x}_t in X$ and receive bandit feedback on our loss $langle mathbf{x}_t, mathbf{y}_t angle in [-1,1]$, where $mathbf{y}_t$ is an adversarially chosen loss vector. Our goal is to minimize regret with respect to the best path in hindsight over $T$ rounds. We propose the first computationally efficient algorithm to achieve a near-minimax optimal regret bound of $ ilde O(sqrt{|E|Tlog |X|})$ with high probability against any adaptive adversary, where $ ilde O(cdot)$ hides logarithmic factors in the number of edges $|E|$. Our algorithm leverages a novel loss estimator and a centroid-based decomposition in a nontrivial manner to attain this regret bound. As an application, we show that our algorithm for DAGs provides state-of-the-art efficient algorithms for $m$-sets, extensive-form games, the Colonel Blotto game, shortest walks in directed graphs, hypercubes, and multi-task multi-armed bandits, achieving improved high-probability regret guarantees in all these settings.
Problem

Research questions and friction points this paper is trying to address.

Online shortest path problem in DAGs with bandit feedback.
Minimizing regret against adaptive adversaries efficiently.
Achieving near-minimax optimal regret bounds for various applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient algorithm for online DAG shortest paths
Novel loss estimator for bandit feedback
Centroid-based decomposition for minimax regret
🔎 Similar Papers
No similar papers found.