VariBASed: Variational Bayes-Adaptive Sequential Monte-Carlo Planning for Deep Reinforcement Learning

📅 2026-02-21

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the computational intractability of jointly optimizing belief inference and planning in Bayesian adaptive reinforcement learning. It proposes a novel variational framework that, for the first time, cohesively integrates variational belief learning, sequential Monte Carlo (SMC) planning, and meta-reinforcement learning within Bayesian adaptive Markov decision processes to jointly optimize learning and planning. The resulting approach substantially improves both sample efficiency and computational runtime, enabling scalability to larger planning budgets on a single GPU. This effectively overcomes the longstanding bottlenecks in scalability and computational efficiency that have limited existing methods in this domain.

Technology Category

Application Category

📝 Abstract

Optimally trading-off exploration and exploitation is the holy grail of reinforcement learning as it promises maximal data-efficiency for solving any task. Bayes-optimal agents achieve this, but obtaining the belief-state and performing planning are both typically intractable. Although deep learning methods can greatly help in scaling this computation, existing methods are still costly to train. To accelerate this, this paper proposes a variational framework for learning and planning in Bayes-adaptive Markov decision processes that coalesces variational belief learning, sequential Monte-Carlo planning, and meta-reinforcement learning. In a single-GPU setup, our new method VariBASeD exhibits favorable scaling to larger planning budgets, improving sample- and runtime-efficiency over prior methods.

Problem

Research questions and friction points this paper is trying to address.

exploration-exploitation trade-off

Bayes-optimal reinforcement learning

belief-state intractability

planning intractability

data-efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational Bayes

Sequential Monte Carlo

Bayes-adaptive MDP