Deep Gaussian Process Proximal Policy Optimization

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks in reinforcement learning often produce poorly calibrated uncertainty estimates, hindering safe exploration and reliable decision-making. To address this, we propose a scalable, model-free Actor-Critic algorithm that integrates Deep Gaussian Processes (DGPs) with Proximal Policy Optimization (PPO), the first such approach to achieve both high-performance policy learning and well-calibrated uncertainty quantification in high-dimensional continuous control tasks. Our method employs stochastic variational inference for efficient DGP training and jointly models uncertainty in both the policy (Actor) and value function (Critic). Experiments on standard benchmarks demonstrate that the algorithm matches PPO’s asymptotic performance while delivering significantly improved uncertainty calibration—evidenced by lower expected calibration error and sharper, more reliable confidence intervals. This enhanced calibration directly improves exploration safety and decision trustworthiness in uncertain environments.

Technology Category

Application Category

📝 Abstract
Uncertainty estimation for Reinforcement Learning (RL) is a critical component in control tasks where agents must balance safe exploration and efficient learning. While deep neural networks have enabled breakthroughs in RL, they often lack calibrated uncertainty estimates. We introduce Deep Gaussian Process Proximal Policy Optimization (GPPO), a scalable, model-free actor-critic algorithm that leverages Deep Gaussian Processes (DGPs) to approximate both the policy and value function. GPPO maintains competitive performance with respect to Proximal Policy Optimization on standard high-dimensional continuous control benchmarks while providing well-calibrated uncertainty estimates that can inform safer and more effective exploration.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement learning lacks calibrated uncertainty estimates for control tasks
Deep neural networks in RL often provide poorly calibrated uncertainty estimates
Agents need better uncertainty estimation to balance safe exploration and learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Deep Gaussian Processes for policy approximation
Leverages actor-critic algorithm for uncertainty estimation
Maintains competitive performance with safer exploration
🔎 Similar Papers
No similar papers found.