Convergent Reinforcement Learning Algorithms for Stochastic Shortest Path Problem

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of rigorous convergence guarantees for reinforcement learning (RL) algorithms in stochastic shortest path (SSP) problems. We propose two tabular RL algorithms and one function approximation–based RL algorithm. Methodologically, we integrate dynamic programming principles with stochastic approximation theory, designing value-iteration–style update rules coupled with a step-size sequence satisfying the Robbins–Monro conditions—thereby achieving, for the first time in the SSP setting, asymptotic almost-sure convergence under multiple cost criteria (e.g., total, discounted, and average cost). Theoretically, we establish the first unified convergence framework for SSP that applies to both tabular and function approximation settings. Empirically, our algorithms demonstrate improved convergence stability and superior performance across multiple benchmark tasks; notably, the function approximation variant exhibits significantly enhanced generalization capability.

Technology Category

Application Category

📝 Abstract
In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.
Problem

Research questions and friction points this paper is trying to address.

Developing convergent algorithms for Stochastic Shortest Path problems
Addressing both tabular and function approximation RL settings
Ensuring asymptotic almost-sure convergence in SSP solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convergent RL algorithms for SSP
Asymptotic almost-sure convergence guarantee
Superior tabular and reliable approximation performance
🔎 Similar Papers
No similar papers found.
S
Soumyajit Guin
Department of Computer Science and Automation, Indian Institute of Science, Bengaluru 560012, India
Shalabh Bhatnagar
Shalabh Bhatnagar
Professor in the Department of Computer Science and Automation, Indian Institute of Science
Stochastic systemscontrolsimulationoptimization