Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenges of computationally expensive and model-sensitive belief-space planning in partially observable safety-critical control. The authors propose a lightweight risk-gated reinforcement learning framework that constructs agent states from limited observation histories and incorporates an action-conditional risk predictor. This predictor simultaneously informs risk penalties in value learning and gates decision-making: maximizing reward under low-risk conditions while switching to a conservative policy when risk is high, thereby avoiding explicit modeling of the full belief space. Evaluated on blood glucose regulation and Safety-Gym navigation tasks, the method significantly outperforms existing safe reinforcement learning and belief-space baselines, achieving superior reward-cost trade-offs with reduced computational overhead.

📝 Abstract

Many safety-critical control problems are modeled as risk-sensitive partially observable Markov decision processes, where the controller must make decisions from incomplete observations while balancing task performance against safety risk. Although belief-space planning provides a principled solution, maintaining and planning over beliefs can be computationally costly and sensitive to model specification in practical domains. We propose a lightweight risk-gated reinforcement learning approximation for risk-sensitive control under partial observability. The method constructs a compact finite-history proxy state and learns an action-conditioned predictor of near-term safety violation. This predicted candidate-action risk is used in two complementary ways: as a risk penalty during value learning, and as a decision-time gate that interpolates between optimistic and conservative ensemble value estimates. As a result, low-risk actions are evaluated closer to reward-seeking estimates, while high-risk actions are evaluated more conservatively. We evaluate the approach in two safety-critical partially observable domains: automated glucose regulation and safety-constrained navigation. Across adult and adolescent glucose-control cohorts, the method improves overall glycemic tradeoffs and substantially reduces runtime relative to a belief-space planning baseline. On Safety-Gym navigation benchmarks, it achieves a more favorable reward-cost balance than unconstrained RL and several standard safe-RL baselines. These results suggest that action-conditioned near-term risk can provide an effective local signal for approximate risk-sensitive POMDP control when full belief-space planning is impractical.

Problem

Research questions and friction points this paper is trying to address.

risk-sensitive control

partial observability

safety-critical systems

POMDP

Innovation

Methods, ideas, or system contributions that make the work stand out.

risk-gated reinforcement learning

partial observability

action-conditioned risk prediction

belief-space approximation