PIGDreamer: Privileged Information Guided World Models for Safe Partially Observable Reinforcement Learning

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

In safety-critical reinforcement learning under partial observability, existing methods suffer from inadequate risk identification and poor policy generalization due to insufficient exploitation of privileged information. Method: This paper proposes the ACPOMDP (Augmented Completely observable Privileged Markov Decision Process) theoretical framework—the first to systematically integrate privileged information into partially observable Markov decision process modeling. We design a privileged representation alignment mechanism and an asymmetric actor-critic architecture within a world model framework, enabling efficient privileged-information-guided training and privilege-free safe inference. Contribution/Results: Our approach guarantees strict safety constraints while significantly improving task performance and training stability. It outperforms state-of-the-art safe RL and privileged learning baselines across multiple benchmark tasks, demonstrating superior convergence, generalization, and engineering practicality.

Technology Category

Application Category

📝 Abstract

Partial observability presents a significant challenge for safe reinforcement learning, as it impedes the identification of potential risks and rewards. Leveraging specific types of privileged information during training to mitigate the effects of partial observability has yielded notable empirical successes. In this paper, we propose Asymmetric Constrained Partially Observable Markov Decision Processes (ACPOMDPs) to theoretically examine the advantages of incorporating privileged information. Building upon ACPOMDPs, we propose the Privileged Information Guided Dreamer, a model-based safe reinforcement learning approach that leverages privileged information to enhance the agent's safety and performance through privileged representation alignment and an asymmetric actor-critic structure. Our empirical results demonstrate that our approach significantly outperforms existing methods in terms of safety and task-centric performance. Meanwhile, compared to alternative privileged model-based reinforcement learning methods, our approach exhibits superior performance and ease of training.

Problem

Research questions and friction points this paper is trying to address.

Address safe reinforcement learning under partial observability

Leverage privileged information to improve safety and performance

Propose a model-based approach with representation alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging privileged information for safety

Asymmetric actor-critic structure enhances performance

Privileged representation alignment improves training

🔎 Similar Papers

No similar papers found.