Confounding Robust Continuous Control via Automatic Reward Shaping

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the lack of robustness in existing reward shaping methods for offline continuous control when unobserved confounders are present. It introduces causal inference into automatic reward shaping for the first time, leveraging a causal Bellman equation to construct a tight upper bound on the optimal state value, which is then employed as a potential function within the potential-based reward shaping (PBRS) framework. Integrated with Soft Actor-Critic (SAC) under offline reinforcement learning settings, the proposed approach effectively mitigates confounding bias and significantly enhances both the stability and performance of policy training. Empirical results demonstrate that the method maintains strong robustness and superior performance across multiple standard continuous control benchmarks, even in the presence of unobserved confounders, thereby offering a novel causal perspective for offline reinforcement learning.

Technology Category

Application Category

📝 Abstract

Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents'training. However, a principled way of designing effective reward shaping functions, especially for complex continuous control problems, remains largely under-explained. In this work, we propose to automatically learn a reward shaping function for continuous control problems from offline datasets, potentially contaminated by unobserved confounding variables. Specifically, our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values, which is then used as the potentials in the Potential-Based Reward Shaping (PBRS) framework. Our proposed reward shaping algorithm is tested with Soft-Actor-Critic (SAC) on multiple commonly used continuous control benchmarks and exhibits strong performance guarantees under unobserved confounders. More broadly, our work marks a solid first step towards confounding robust continuous control from a causal perspective. Code for training our reward shaping functions can be found at https://github.com/mateojuliani/confounding_robust_cont_control.

Problem

Research questions and friction points this paper is trying to address.

reward shaping

confounding robustness

continuous control

offline reinforcement learning

causal inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward shaping

causal reinforcement learning

confounding robustness