Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of intractable belief MDP solving, poor generalization, and high hyperparameter sensitivity in partially observable Markov decision processes (POMDPs), this paper introduces, for the first time, the strict convexity of the belief-space value function as a structural prior into deep reinforcement learning. We propose a dual-mechanism convexity enforcement: hard constraints via projected gradient clipping and soft constraints via convexity-regularized loss, integrated within a DQN architecture that jointly learns belief-state encoding and convexity-guided policy optimization. Evaluated on standard POMDP benchmarks—including Tiger and FieldVisionRockSample—the method achieves end-to-end learning with a 37% average performance gain, 2.1× improvement in out-of-distribution robustness, and 58% reduction in hyperparameter sensitivity. Our core contribution lies in the systematic exploitation of value-function convexity as a prior to enhance both generalization capability and training stability of deep RL in POMDP settings.

Technology Category

Application Category

📝 Abstract
We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.
Problem

Research questions and friction points this paper is trying to address.

Enhancing DRL in POMDPs
Incorporating convexity in value functions
Improving agent performance and robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convexity-informed Deep Reinforcement Learning
Hard- and soft-enforced convexity methods
Enhanced performance in POMDP environments
🔎 Similar Papers
No similar papers found.
D
Daniel Koutas
ERA group, TU Munich, Munich, Germany
D
Daniel Hettegger
AIR chair, TU Munich, Munich, Germany
K
Kostas G. Papakonstantinou
College of Engineering, Penn State University, Pennsylvania, USA
Daniel Straub
Daniel Straub
TU München, Engineering Risk Analysis Group
Risk analysisprobabilistic modellingengineering statisticsnatural hazardsstructural reliability