Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of enabling an agent in stochastic environments to satisfy reach-avoid specifications with probability at least $ p $ while minimizing expected cumulative cost—a dual objective that existing methods struggle to balance. To this end, the paper introduces Reach-Avoid Probabilistic Certificates (RAPCs) to characterize the set of states from which the specification can be satisfied with the required probability. Leveraging RAPCs, the authors formulate a contractive Bellman equation that intrinsically embeds the probabilistic constraint into the reinforcement learning framework. This approach is the first to jointly guarantee both probabilistic satisfaction of temporal specifications and cost optimality in stochastic reinforcement learning, with theoretical proof that the learned policy converges almost surely to a local optimum. Empirical results on MuJoCo benchmarks demonstrate significantly reduced cumulative costs while consistently achieving higher constraint satisfaction rates compared to prior methods.

📝 Abstract

We study stochastic minimum-cost reach-avoid reinforcement learning, where an agent must satisfy a reach-avoid specification with probability at least $p$ while minimizing expected cumulative costs in stochastic environments. Existing safe and constrained reinforcement learning methods typically fail to jointly enforce probabilistic reach-avoid constraints and optimize cost in the learning setting in stochastic environments. To address this challenge, we introduce reach-avoid probability certificates (RAPCs), which identify states from which stochastic reach-avoid constraints are satisfiable. Building on RAPCs, we develop a contraction-based Bellman formulation that serves as a principled surrogate for integrating reach-avoid considerations into reinforcement learning, enabling cost optimization under probabilistic constraints. We establish almost sure convergence of the proposed algorithms to locally optimal policies with respect to the resulting objective. Experiments in the MuJoCo simulator demonstrate improved cost performance and consistently higher reach-avoid satisfaction rates.

Problem

Research questions and friction points this paper is trying to address.

stochastic reinforcement learning

minimum-cost

reach-avoid constraints

probabilistic safety

constrained optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

reach-avoid constraints

stochastic reinforcement learning

probability certificates