Safety-Constrained Reinforcement Learning with Post-Training Reachability Verification for Robot Navigation

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the limitations of existing safe reinforcement learning methods that rely on average cumulative cost and thus fail to adequately capture high-risk tail events, leading to insufficient policy reliability in complex environments with perceptual uncertainty. To bridge this gap, the study introduces a novel integration of conditional value-at-risk (CVaR)-constrained reinforcement learning with Taylor-model-based reachability analysis. By combining off-policy TD3 optimization with the computation of action reachable sets under bounded observation uncertainty, the approach exposes a critical inconsistency between conventional average-cost metrics and formal safety verification. Evaluated across ten navigation scenarios, the method achieves a 98.3% success rate while demonstrating superior safety verification performance and successfully transferring from simulation to real-world deployment on a Clearpath Jackal robot.

📝 Abstract

Safe navigation for mobile robots demands policies that remain reliable under the high-consequence perception uncertainty of cluttered environments. Yet most existing safe reinforcement learning (RL) methods assess safety through average cumulative cost. Such metrics can mask dangerous tail-risk behaviors. To address this, we propose a framework that trains risk-sensitive policies through Conditional Value-at-Risk (CVaR) constrained optimization on an off-policy TD3 backbone and evaluates their safety margins post-training through neural network reachability verification. During training, the policy is optimized under CVaR constraints on cumulative costs, promoting sensitivity to high-cost tail outcomes rather than average behavior alone. After training, we compute action reachable sets under bounded observation uncertainty using Taylor Model analysis, yielding a safety rate metric that quantifies the proportion of evaluated states at which the policy's reachable action set remains within prescribed safety margins. A key finding is that policies trained with CVaR constraints maintain larger safety margins from obstacles across evaluated states. This makes them significantly more amenable to formal reachability verification. Experiments across ten navigation scenarios and six baselines show that our method achieves a 98.3\% success rate, the highest safety verification rate among all compared methods, while revealing that average cost rankings and reachability-based safety rankings can diverge. This indicates that reachability verification captures risks which are missed by empirical cost metrics alone. We further validate our approach on a physical Clearpath Jackal robot, demonstrating successful sim-to-real transfer.

Problem

Research questions and friction points this paper is trying to address.

safe reinforcement learning

perception uncertainty

tail-risk behaviors

robot navigation

safety verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional Value-at-Risk (CVaR)

reachability verification

Taylor Model