Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenges faced by real-time deep reinforcement learning (DRL) systems—such as sim-to-real gaps, partial observability, and latency—which often lead to failures due to the absence of formal fault-tolerance mechanisms that jointly guarantee safety and performance. To bridge this gap, we propose a formal framework that models the switching logic between a DRL agent and a fallback controller using timed automata, which is then transformed into a Markov decision process (MDP) amenable to multi-objective model checking. We introduce a novel convex query technique that simultaneously enforces hard safety constraints and optimizes soft performance objectives. Furthermore, we develop MOPMC, the first GPU-accelerated tool for multi-objective model checking. Experimental results demonstrate that MOPMC achieves exceptional scalability in both model size and number of objectives, significantly enhancing the safety and performance of real-time DRL systems.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning (DRL) has emerged as a powerful paradigm for solving complex decision-making problems. However, DRL-based systems still face significant dependability challenges particularly in real-time environments due to the simulation-to-reality gap, out-of-distribution observations, and the critical impact of latency. Latency-induced faults, in particular, can lead to unsafe or unstable behaviour, yet existing fault-tolerance approaches to DRL systems lack formal methods to rigorously analyse and optimise performance and safety simultaneously in real-time settings. To address this, we propose a formal framework for designing and analysing real-time switching mechanisms between DRL agents and alternative controllers. Our approach leverages Timed Automata (TAs) for explicit switch logic design, which is then syntactically converted to a Markov Decision Process (MDP) for formal analysis. We develop a novel convex query technique for multi-objective model checking, enabling the optimisation of soft performance objectives while ensuring hard safety constraints for MDPs. Furthermore, we present MOPMC, a GPU-accelerated software tool implementing this technique, demonstrating superior scalability in both model size and objective numbers.

Problem

Research questions and friction points this paper is trying to address.

Fault-Tolerant

Real-Time

Deep Reinforcement Learning

Model Checking

Latency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Timed Automata

Markov Decision Process

Multi-Objective Model Checking