Confounding Robust Deep Reinforcement Learning: A Causal Approach

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Unobserved confounders in high-dimensional offline reinforcement learning induce policy estimation bias due to distributional mismatch between behavior and target policies. Method: We propose a causally grounded robust deep Q-learning framework that explicitly models confounding bias within the DQN architecture by integrating causal inference with robust optimization—specifically, evaluating and optimizing policies under worst-case environmental perturbations induced by latent confounders. Results: Evaluated on 12 Atari games with synthetically introduced confounding structures, our method significantly outperforms standard DQN and state-of-the-art offline RL baselines, demonstrating robustness to unobserved confounding and strong generalization. Contribution: This work is the first to systematically embed causal reasoning into deep Q-learning, yielding a scalable and empirically verifiable solution for offline policy learning under unobserved confounding.

Technology Category

Application Category

📝 Abstract

A key task in Artificial Intelligence is learning effective policies for controlling agents in unknown environments to optimize performance measures. Off-policy learning methods, like Q-learning, allow learners to make optimal decisions based on past experiences. This paper studies off-policy learning from biased data in complex and high-dimensional domains where emph{unobserved confounding} cannot be ruled out a priori. Building on the well-celebrated Deep Q-Network (DQN), we propose a novel deep reinforcement learning algorithm robust to confounding biases in observed data. Specifically, our algorithm attempts to find a safe policy for the worst-case environment compatible with the observations. We apply our method to twelve confounded Atari games, and find that it consistently dominates the standard DQN in all games where the observed input to the behavioral and target policies mismatch and unobserved confounders exist.

Problem

Research questions and friction points this paper is trying to address.

Addressing unobserved confounding in off-policy reinforcement learning

Developing robust algorithms for biased high-dimensional observational data

Improving policy safety in environments with confounding biases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust deep reinforcement learning algorithm

Addresses unobserved confounding in data

Finds safe policy for worst-case environment

🔎 Similar Papers

No similar papers found.