RL-STPA: Adapting System-Theoretic Hazard Analysis for Safety-Critical Reinforcement Learning

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing reinforcement learning evaluation methods struggle to systematically identify safety hazards arising from the black-box nature of policies and distribution shifts between training and deployment. This work proposes RL-STPA, a novel framework that adapts System-Theoretic Process Analysis (STPA) to reinforcement learning settings. By integrating temporally and domain-knowledge-driven hierarchical subtask decomposition, coverage-guided perturbation testing in state-action space, and hazard-aware reward shaping combined with curriculum learning, RL-STPA enables systematic risk analysis for safety-critical systems. Evaluated on autonomous drone navigation and landing tasks, the approach successfully uncovers hazardous scenarios missed by standard evaluation protocols and provides actionable guidelines for safety boundary specification along with quantifiable safety coverage metrics.

Technology Category

Application Category

📝 Abstract

As reinforcement learning (RL) deployments expand into safety-critical domains, existing evaluation methods fail to systematically identify hazards arising from the black-box nature of neural network enabled policies and distributional shift between training and deployment. This paper introduces Reinforcement Learning System-Theoretic Process Analysis (RL-STPA), a framework that adapts conventional STPA's systematic hazard analysis to address RL's unique challenges through three key contributions: hierarchical subtask decomposition using both temporal phase analysis and domain expertise to capture emergent behaviors, coverage-guided perturbation testing that explores the sensitivity of state-action spaces, and iterative checkpoints that feed identified hazards back into training through reward shaping and curriculum design. We demonstrate RL-STPA in the safety-critical test case of autonomous drone navigation and landing, revealing potential loss scenarios that can be missed by standard RL evaluations. The proposed framework provides practitioners with a toolkit for systematic hazard analysis, quantitative metrics for safety coverage assessment, and actionable guidelines for establishing operational safety bounds. While RL-STPA cannot provide formal guarantees for arbitrary neural policies, it offers a practical methodology for systematically evaluating and improving RL safety and robustness in safety-critical applications where exhaustive verification methods remain intractable.

Problem

Research questions and friction points this paper is trying to address.

reinforcement learning

safety-critical systems

hazard analysis

distributional shift

black-box policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-STPA

hazard analysis

coverage-guided perturbation