Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenges of scalability and robustness in policy computation for partially observable Markov decision processes (POMDPs) and their hidden-model variants (HM-POMDPs). The authors propose a novel approach that integrates deep reinforcement learning with verifiable finite-state controllers (FSCs). Initially, a recurrent neural network policy is trained using deep reinforcement learning; this policy is then distilled into a compact, formally verifiable FSC. The method is further extended to multi-model uncertainty settings, yielding robust controllers with guaranteed worst-case performance bounds. This study presents the first successful integration of deep reinforcement learning with verifiable FSC synthesis, demonstrating significant improvements over existing solvers on large-scale problems while providing rigorous performance guarantees.

Technology Category

Application Category

📝 Abstract

Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solvers remains limited. Moreover, many settings require a policy that is robust across multiple POMDPs, further aggravating the scalability issue. We propose the Lexpop framework for POMDP solving. Lexpop (1) employs deep reinforcement learning to train a neural policy, represented by a recurrent neural network, and (2) constructs a finite-state controller mimicking the neural policy through efficient extraction methods. Crucially, unlike neural policies, such controllers can be formally evaluated, providing performance guarantees. We extend Lexpop to compute robust policies for hidden-model POMDPs (HM-POMDPs), which describe finite sets of POMDPs. We associate every extracted controller with its worst-case POMDP. Using a set of such POMDPs, we iteratively train a robust neural policy and consequently extract a robust controller. Our experiments show that on problems with large state spaces, Lexpop outperforms state-of-the-art solvers for POMDPs as well as HM-POMDPs.

Problem

Research questions and friction points this paper is trying to address.

POMDP

scalability

robust policy

HM-POMDP

finite-state controller

Innovation

Methods, ideas, or system contributions that make the work stand out.

finite-state controller

deep reinforcement learning

POMDP