Deep Gaussian Process Proximal Policy Optimization

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Deep neural networks in reinforcement learning often produce poorly calibrated uncertainty estimates, hindering safe exploration and reliable decision-making. To address this, we propose a scalable, model-free Actor-Critic algorithm that integrates Deep Gaussian Processes (DGPs) with Proximal Policy Optimization (PPO), the first such approach to achieve both high-performance policy learning and well-calibrated uncertainty quantification in high-dimensional continuous control tasks. Our method employs stochastic variational inference for efficient DGP training and jointly models uncertainty in both the policy (Actor) and value function (Critic). Experiments on standard benchmarks demonstrate that the algorithm matches PPO’s asymptotic performance while delivering significantly improved uncertainty calibration—evidenced by lower expected calibration error and sharper, more reliable confidence intervals. This enhanced calibration directly improves exploration safety and decision trustworthiness in uncertain environments.

Technology Category

Application Category

📝 Abstract

Uncertainty estimation for Reinforcement Learning (RL) is a critical component in control tasks where agents must balance safe exploration and efficient learning. While deep neural networks have enabled breakthroughs in RL, they often lack calibrated uncertainty estimates. We introduce Deep Gaussian Process Proximal Policy Optimization (GPPO), a scalable, model-free actor-critic algorithm that leverages Deep Gaussian Processes (DGPs) to approximate both the policy and value function. GPPO maintains competitive performance with respect to Proximal Policy Optimization on standard high-dimensional continuous control benchmarks while providing well-calibrated uncertainty estimates that can inform safer and more effective exploration.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement learning lacks calibrated uncertainty estimates for control tasks

Deep neural networks in RL often provide poorly calibrated uncertainty estimates

Agents need better uncertainty estimation to balance safe exploration and learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Deep Gaussian Processes for policy approximation

Leverages actor-critic algorithm for uncertainty estimation

Maintains competitive performance with safer exploration

🔎 Similar Papers

No similar papers found.