On Efficient Bayesian Exploration in Model-Based Reinforcement Learning

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low data efficiency in model-based reinforcement learning, this paper proposes PTS-BE, a framework that drives agent exploration via Bayesian information gain (IG) rewards targeting regions of high epistemic uncertainty. Theoretically, we prove that the IG reward strictly quantifies model uncertainty and converges to zero upon full knowledge acquisition, providing rigorous theoretical grounding for Bayesian exploration. Methodologically, PTS-BE integrates sparse variational Gaussian processes, deep kernel learning, and deep ensembles to enable scalable, high-fidelity posterior modeling. Empirical evaluations demonstrate that PTS-BE significantly outperforms state-of-the-art baselines on sparse-reward and pure-exploration tasks, achieving 2–5× improvements in sample efficiency. These results validate its dual capability: efficient environment exploration and effective policy learning under stringent data budgets.

Technology Category

Application Category

📝 Abstract
In this work, we address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation. Specifically, we focus on a class of exploration bonuses that targets epistemic uncertainty rather than the aleatoric noise inherent in the environment. We prove that these bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment's dynamics and rewards, thereby aligning exploration with genuine knowledge gaps. Our analysis provides formal guarantees for IG-based approaches, which previously lacked theoretical grounding. To enable practical use, we also discuss tractable approximations via sparse variational Gaussian Processes, Deep Kernels and Deep Ensemble models. We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration. We empirically demonstrate that PTS-BE substantially outperforms other baselines across a variety of environments characterized by sparse rewards and/or purely exploratory tasks.
Problem

Research questions and friction points this paper is trying to address.

Address data-efficient exploration in reinforcement learning
Target epistemic uncertainty for intrinsic motivation
Integrate model-based planning with exploration bonuses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Targets epistemic uncertainty for exploration
Uses sparse variational Gaussian Processes
Integrates model-based planning with Bayesian exploration
🔎 Similar Papers
No similar papers found.