Probabilistic Safety Guarantee for Stochastic Control Systems Using Average Reward MDPs

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses safety verification and policy synthesis for stochastic control systems under known noise distributions. We propose a safety-critical control modeling framework based on average-reward Markov decision processes (MDPs). Our key contribution is the first rigorous reduction of high-confidence state-constraint satisfaction to a standard average-reward MDP formulation, enabling direct application of mature optimization tools—such as linear programming—to synthesize safe policies. Unlike conventional discounted-reward approaches, our formulation eliminates bias induced by discounting and avoids slow convergence. Experimental evaluation on the Double Integrator and inverted pendulum benchmarks demonstrates that the synthesized policies significantly outperform baseline minimum-discounted-reward policies in three critical aspects: convergence speed, completeness of safe-state coverage, and overall policy quality.

Technology Category

Application Category

📝 Abstract
Safety in stochastic control systems, which are subject to random noise with a known probability distribution, aims to compute policies that satisfy predefined operational constraints with high confidence throughout the uncertain evolution of the state variables. The unpredictable evolution of state variables poses a significant challenge for meeting predefined constraints using various control methods. To address this, we present a new algorithm that computes safe policies to determine the safety level across a finite state set. This algorithm reduces the safety objective to the standard average reward Markov Decision Process (MDP) objective. This reduction enables us to use standard techniques, such as linear programs, to compute and analyze safe policies. We validate the proposed method numerically on the Double Integrator and the Inverted Pendulum systems. Results indicate that the average-reward MDPs solution is more comprehensive, converges faster, and offers higher quality compared to the minimum discounted-reward solution.
Problem

Research questions and friction points this paper is trying to address.

Computing safe policies for stochastic control systems under uncertainty
Reducing safety objectives to average reward MDP frameworks
Validating method on Double Integrator and Inverted Pendulum systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces safety objective to average reward MDP
Uses linear programs to compute safe policies
Validated on Double Integrator and Inverted Pendulum
🔎 Similar Papers
No similar papers found.
Saber Omidi
Saber Omidi
Ministry of Education (Iran)
HyperringsOrdered hyperstructuresAlgebraic geometry over hyperstructures
Marek Petrik
Marek Petrik
University of New Hampshire
Machine Learning
S
Se Young Yoon
Electrical and Computer Engineering, University of New Hampshire
M
M. Begum
Department of Computer Science, University of New Hampshire