Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Classical Bayesian reinforcement learning (BRL) suffers from poor generalization due to its reliance on known transition and reward model structures, rendering it inadequate for tasks with ambiguous or unknown parameters. To address this, we propose a novel paradigm that decouples task representation from dynamics modeling. Our core innovation is a learnable, nonlinear basis-function-driven generalized linear model, enabling closed-form marginal likelihood computation for transition and reward functions and facilitating exact Bayesian inference—thereby avoiding the bias inherent in variational inference (e.g., ELBO optimization). Integrated within a deep meta-RL framework, our method achieves up to a 2.7× improvement in success rate over VariBAD on the MetaWorld ML10 and ML45 benchmarks. It also consistently outperforms established baselines—including MAML and RL²—with lower variance, more stable convergence, and superior cross-task generalization.

Technology Category

Application Category

📝 Abstract

Bayesian Reinforcement Learning (BRL) provides a framework for generalisation of Reinforcement Learning (RL) problems from its use of Bayesian task parameters in the transition and reward models. However, classical BRL methods assume known forms of transition and reward models, reducing their applicability in real-world problems. As a result, recent deep BRL methods have started to incorporate model learning, though the use of neural networks directly on the joint data and task parameters requires optimising the Evidence Lower Bound (ELBO). ELBOs are difficult to optimise and may result in indistinctive task parameters, hence compromised BRL policies. To this end, we introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL), that enables efficient and accurate learning of transition and reward models, with fully tractable marginal likelihood and Bayesian inference on task parameters and model noises. On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x. Comparing against representative or recent deep BRL / Meta-RL methods, such as MAML, RL2, SDVT, TrMRL and ECET, GLiBRL also demonstrates its low-variance and decent performance consistently.

Problem

Research questions and friction points this paper is trying to address.

Enables efficient learning of transition and reward models in Bayesian RL

Addresses optimization difficulties of ELBO in deep Bayesian RL methods

Improves generalization and performance on challenging MetaWorld benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

GLiBRL uses learnable basis functions for efficient model learning

It enables tractable marginal likelihood and Bayesian inference

The method improves success rates on MetaWorld benchmarks significantly

🔎 Similar Papers

No similar papers found.