🤖 AI Summary
This work addresses the lack of rigorous convergence guarantees for reinforcement learning (RL) algorithms in stochastic shortest path (SSP) problems. We propose two tabular RL algorithms and one function approximation–based RL algorithm. Methodologically, we integrate dynamic programming principles with stochastic approximation theory, designing value-iteration–style update rules coupled with a step-size sequence satisfying the Robbins–Monro conditions—thereby achieving, for the first time in the SSP setting, asymptotic almost-sure convergence under multiple cost criteria (e.g., total, discounted, and average cost). Theoretically, we establish the first unified convergence framework for SSP that applies to both tabular and function approximation settings. Empirically, our algorithms demonstrate improved convergence stability and superior performance across multiple benchmark tasks; notably, the function approximation variant exhibits significantly enhanced generalization capability.
📝 Abstract
In this paper we propose two algorithms in the tabular setting and an algorithm for the function approximation setting for the Stochastic Shortest Path (SSP) problem. SSP problems form an important class of problems in Reinforcement Learning (RL), as other types of cost-criteria in RL can be formulated in the setting of SSP. We show asymptotic almost-sure convergence for all our algorithms. We observe superior performance of our tabular algorithms compared to other well-known convergent RL algorithms. We further observe reliable performance of our function approximation algorithm compared to other algorithms in the function approximation setting.