Stochastic Shortest Path with Sparse Adversarial Costs

📅 2025-11-01

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This paper studies the sparse adversarial stochastic shortest path (SSP) problem under full-information feedback. To address the limitations of existing negative-entropy regularization—namely, its suboptimal regret bound of √log(SA) when transitions are known and its inability to exploit cost sparsity (only M ≪ SA state-action pairs incur nonzero costs)—we propose an online mirror descent algorithm with ℓᵣ-norm regularization for r ∈ (1,2). Our method is the first to adaptively leverage cost sparsity: under known transitions, it achieves the optimal regret bound of √log M, which we prove tight via a matching lower bound; under unknown transitions, we rigorously characterize the fundamental limit of sparsity gains—showing that any algorithm’s minimax regret must scale polynomially with SA. These results establish that the intrinsic complexity of SSP is governed by the effective sparse dimension M, not the full state-action space size SA.

Technology Category

Application Category

📝 Abstract

We study the adversarial Stochastic Shortest Path (SSP) problem with sparse costs under full-information feedback. In the known transition setting, existing bounds based on Online Mirror Descent (OMD) with negative-entropy regularization scale with $sqrt{log S A}$, where $SA$ is the size of the state-action space. While we show that this is optimal in the worst-case, this bound fails to capture the benefits of sparsity when only a small number $M ll SA$ of state-action pairs incur cost. In fact, we also show that the negative-entropy is inherently non-adaptive to sparsity: it provably incurs regret scaling with $sqrt{log S}$ on sparse problems. Instead, we propose a family of $ell_r$-norm regularizers ($r in (1,2)$) that adapts to the sparsity and achieves regret scaling with $sqrt{log M}$ instead of $sqrt{log SA}$. We show this is optimal via a matching lower bound, highlighting that $M$ captures the effective dimension of the problem instead of $SA$. Finally, in the unknown transition setting the benefits of sparsity are limited: we prove that even on sparse problems, the minimax regret for any learner scales polynomially with $SA$.

Problem

Research questions and friction points this paper is trying to address.

Optimizing adversarial stochastic shortest path with sparse costs

Developing adaptive regularizers for improved regret bounds

Analyzing sparsity benefits in known versus unknown transitions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses l_r-norm regularizers for sparsity adaptation

Achieves regret scaling with log M instead of log SA

Proves optimality via matching lower bound for M

🔎 Similar Papers

No similar papers found.