Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

πŸ“… 2024-10-07
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address low sample efficiency in robot policy learning under sparse rewards, action penalties, and high exploration difficulty, this paper proposes an optimistic model-predictive method based on Thompson sampling. The core contribution is the first construction of a Bayesian neural network architecture capable of joint uncertainty inference over both transition and reward functions; crucially, optimism is deeply integrated with the Bayesian beliefβ€”driven by reward-state correlations rather than independent parameter perturbations. Embedded within a model-based reinforcement learning framework, the method is evaluated on continuous-control benchmarks in MuJoCo and VMAS. Results demonstrate significantly accelerated convergence (2.3Γ— average speedup), critical exploration gains in high-uncertainty regions, and empirical validation that joint uncertainty modeling is decisive for effective exploration guidance.

Technology Category

Application Category

πŸ“ Abstract
Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our model structure is the first that allows for reasoning about joint uncertainty over transitions and rewards. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.
Problem

Research questions and friction points this paper is trying to address.

Optimistic exploration for efficient reinforcement learning.
Joint uncertainty modeling over transitions and rewards.
Accelerating learning in sparse reward environments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimistic Thompson Sampling for exploration
Joint uncertainty modeling for transitions and rewards
Accelerated learning in sparse reward environments
πŸ”Ž Similar Papers
No similar papers found.