Learning Causal States Under Partial Observability and Perturbation

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Existing reinforcement learning methods struggle to simultaneously ensure robust state inference and policy generalization in partially observable Markov decision processes subject to observation perturbations (P²OMDPs). Method: We propose CaDiff—the first state representation framework integrating diffusion models with causal theory guarantees. It employs an asynchronous diffusion model (ADM) to characterize the observation generation process, introduces a novel dual-similarity metric to formalize causal equivalence, and constructs provably denoisable causal state representations. We further derive a theoretical upper bound on value function estimation error to guarantee policy learning reliability. Results: On the Roboschool benchmark, CaDiff achieves ≥14.18% average return improvement over state-of-the-art baselines, significantly enhancing policy robustness and performance under observation noise and occlusion.

Technology Category

Application Category

📝 Abstract

A critical challenge for reinforcement learning (RL) is making decisions based on incomplete and noisy observations, especially in perturbed and partially observable Markov decision processes (P$^2$OMDPs). Existing methods fail to mitigate perturbations while addressing partial observability. We propose extit{Causal State Representation under Asynchronous Diffusion Model (CaDiff)}, a framework that enhances any RL algorithm by uncovering the underlying causal structure of P$^2$OMDPs. This is achieved by incorporating a novel asynchronous diffusion model (ADM) and a new bisimulation metric. ADM enables forward and reverse processes with different numbers of steps, thus interpreting the perturbation of P$^2$OMDP as part of the noise suppressed through diffusion. The bisimulation metric quantifies the similarity between partially observable environments and their causal counterparts. Moreover, we establish the theoretical guarantee of CaDiff by deriving an upper bound for the value function approximation errors between perturbed observations and denoised causal states, reflecting a principled trade-off between approximation errors of reward and transition-model. Experiments on Roboschool tasks show that CaDiff enhances returns by at least 14.18% compared to baselines. CaDiff is the first framework that approximates causal states using diffusion models with both theoretical rigor and practicality.

Problem

Research questions and friction points this paper is trying to address.

Addresses reinforcement learning in partially observable, perturbed environments

Uncovers causal structure to mitigate noise and incomplete observations

Enhances RL algorithms with diffusion models and bisimulation metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asynchronous diffusion model interprets perturbations as noise

Bisimulation metric quantifies similarity between partial observations and causal states

Framework enhances RL by uncovering causal structure in P^2OMDPs

🔎 Similar Papers

No similar papers found.