Bellman Diffusion Models

📅 2024-07-16

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the inconsistency in successor state measure (SSM) modeling and the difficulty of integrating dynamic programming principles into policy representation. We introduce diffusion models to SSM estimation for the first time and propose the Bellman Flow Constraint—a novel, explicit Bellman update mechanism imposed at each diffusion step. This constraint enforces that the intermediate diffusion distributions satisfy the policy-dependent Bellman equation, thereby ensuring theoretical consistency of SSM estimation. The method operates offline without environment interaction, making it suitable for offline reinforcement learning. It achieves significant improvements over existing SSM and diffusion-based baselines on policy evaluation and cross-task transfer, validated on standard benchmarks including D4RL and Offline RL Gym. Our core contribution is a theoretically grounded framework coupling diffusion processes with dynamic programming, yielding a differentiable and falsifiable Bellman-consistent SSM estimator.

Technology Category

Application Category

📝 Abstract

Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.

Problem

Research questions and friction points this paper is trying to address.

Modeling successor state measures using diffusion architectures

Enforcing Bellman flow constraints for policy evaluation

Deriving Bellman updates for diffusion step distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion models model successor state measures

Bellman flow constraints enable simple diffusion updates

Diffusion step distributions follow Bellman equation updates

🔎 Similar Papers

Operator-informed score matching for Markov diffusion models