π€ AI Summary
This work addresses the challenge of temporal uncertainty in real-world tasks, where action effects are often subject to stochastic delaysβa setting in which existing reinforcement learning methods struggle without prior knowledge or non-delayed data. The paper proposes a causal hierarchical reinforcement learning framework that, for the first time, integrates causal modeling with delay-aware empowerment objectives. This approach explicitly models both the causal structure of state transitions and the distribution of random delays, while guiding the agent to actively explore states with high controllability. By overcoming the limitation of prior hierarchical methods that assume fixed delays, the proposed framework achieves substantially superior performance over baselines in environments with stochastic delays, such as 2D-Minecraft and MiniGrid, thereby enhancing decision-making robustness under temporal uncertainty.
π Abstract
Many real-world tasks involve delayed effects, where the outcomes of actions emerge after varying time lags. Existing delay-aware reinforcement learning methods often rely on state augmentation, prior knowledge of delay distributions, or access to non-delayed data, limiting their generalization. Hierarchical reinforcement learning, by contrast, inherently offers advantages in handling delays due to its hierarchical structure, yet existing methods are restricted to fixed delays. To address these limitations, we propose Delay-Empowered Causal Hierarchical Reinforcement Learning (DECHRL). DECHRL explicitly models both the causal structure of state transitions and their associated stochastic delay distributions. These are then incorporated into a delay-aware empowerment objective that drives proactive exploration toward highly controllable states, thereby improving performance under temporal uncertainty. We evaluate DECHRL in modified 2D-Minecraft and MiniGrid environments featuring stochastic delays. Experimental results show that DECHRL effectively models temporal delays and significantly outperforms baselines in decision-making under temporal uncertainty.