🤖 AI Summary
In hierarchical reinforcement learning (HRL), automatically discovering and effectively leveraging subgoal hierarchies remains challenging under long-horizon tasks with sparse rewards.
Method: This paper proposes a causal-graph-based HRL framework that, for the first time, integrates causal discovery algorithms into subgoal modeling to automatically construct a directed causal graph over subgoals. It further introduces a theoretically grounded directional causal intervention mechanism, enabling the high-level policy to select and execute subgoals in an interpretable, low-variance manner. The framework accommodates both tree-structured and general directed acyclic graph (DAG) hierarchies, supporting rigorous theoretical analysis.
Contribution/Results: Experiments on multiple standard HRL benchmarks demonstrate that the proposed method significantly outperforms existing baselines, improving sample efficiency by 30–50%. These results validate the dual benefits of causal structural modeling and intervention—enhancing both performance and interpretability in HRL.
📝 Abstract
Hierarchical reinforcement learning (HRL) improves the efficiency of long-horizon reinforcement-learning tasks with sparse rewards by decomposing the task into a hierarchy of subgoals. The main challenge of HRL is efficient discovery of the hierarchical structure among subgoals and utilizing this structure to achieve the final goal. We address this challenge by modeling the subgoal structure as a causal graph and propose a causal discovery algorithm to learn it. Additionally, rather than intervening on the subgoals at random during exploration, we harness the discovered causal model to prioritize subgoal interventions based on their importance in attaining the final goal. These targeted interventions result in a significantly more efficient policy in terms of the training cost. Unlike previous work on causal HRL, which lacked theoretical analysis, we provide a formal analysis of the problem. Specifically, for tree structures and, for a variant of Erdős-Rényi random graphs, our approach results in remarkable improvements. Our experimental results on HRL tasks also illustrate that our proposed framework outperforms existing work in terms of training cost.