🤖 AI Summary
This work addresses the lack of robustness in existing reward shaping methods for offline continuous control when unobserved confounders are present. It introduces causal inference into automatic reward shaping for the first time, leveraging a causal Bellman equation to construct a tight upper bound on the optimal state value, which is then employed as a potential function within the potential-based reward shaping (PBRS) framework. Integrated with Soft Actor-Critic (SAC) under offline reinforcement learning settings, the proposed approach effectively mitigates confounding bias and significantly enhances both the stability and performance of policy training. Empirical results demonstrate that the method maintains strong robustness and superior performance across multiple standard continuous control benchmarks, even in the presence of unobserved confounders, thereby offering a novel causal perspective for offline reinforcement learning.
📝 Abstract
Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents'training. However, a principled way of designing effective reward shaping functions, especially for complex continuous control problems, remains largely under-explained. In this work, we propose to automatically learn a reward shaping function for continuous control problems from offline datasets, potentially contaminated by unobserved confounding variables. Specifically, our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values, which is then used as the potentials in the Potential-Based Reward Shaping (PBRS) framework. Our proposed reward shaping algorithm is tested with Soft-Actor-Critic (SAC) on multiple commonly used continuous control benchmarks and exhibits strong performance guarantees under unobserved confounders. More broadly, our work marks a solid first step towards confounding robust continuous control from a causal perspective. Code for training our reward shaping functions can be found at https://github.com/mateojuliani/confounding_robust_cont_control.