Delay-Empowered Causal Hierarchical Reinforcement Learning

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

204K/year
πŸ€– AI Summary
This work addresses the challenge of temporal uncertainty in real-world tasks, where action effects are often subject to stochastic delaysβ€”a setting in which existing reinforcement learning methods struggle without prior knowledge or non-delayed data. The paper proposes a causal hierarchical reinforcement learning framework that, for the first time, integrates causal modeling with delay-aware empowerment objectives. This approach explicitly models both the causal structure of state transitions and the distribution of random delays, while guiding the agent to actively explore states with high controllability. By overcoming the limitation of prior hierarchical methods that assume fixed delays, the proposed framework achieves substantially superior performance over baselines in environments with stochastic delays, such as 2D-Minecraft and MiniGrid, thereby enhancing decision-making robustness under temporal uncertainty.
πŸ“ Abstract
Many real-world tasks involve delayed effects, where the outcomes of actions emerge after varying time lags. Existing delay-aware reinforcement learning methods often rely on state augmentation, prior knowledge of delay distributions, or access to non-delayed data, limiting their generalization. Hierarchical reinforcement learning, by contrast, inherently offers advantages in handling delays due to its hierarchical structure, yet existing methods are restricted to fixed delays. To address these limitations, we propose Delay-Empowered Causal Hierarchical Reinforcement Learning (DECHRL). DECHRL explicitly models both the causal structure of state transitions and their associated stochastic delay distributions. These are then incorporated into a delay-aware empowerment objective that drives proactive exploration toward highly controllable states, thereby improving performance under temporal uncertainty. We evaluate DECHRL in modified 2D-Minecraft and MiniGrid environments featuring stochastic delays. Experimental results show that DECHRL effectively models temporal delays and significantly outperforms baselines in decision-making under temporal uncertainty.
Problem

Research questions and friction points this paper is trying to address.

delayed effects
stochastic delays
temporal uncertainty
hierarchical reinforcement learning
delay-aware reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

delay-aware reinforcement learning
causal hierarchical reinforcement learning
stochastic delays
empowerment
temporal uncertainty
C
Chenran Zhao
College of Computer Science and Technology, National University of Defense Technology, Changsha, China
D
Dianxi Shi
College of Computer Science and Technology, National University of Defense Technology, Changsha, China; Intelligent Game and Decision Lab (IGDL), Beijing, China
Haotian Wang
Haotian Wang
National University of Defense Technology
Causal InferenceStrategic Learning
Mengzhu Wang
Mengzhu Wang
National University of Defense Technology
transfer learningcomputer vision
Y
Yaowen Zhang
Institute of Military Transportation, Tianjin, China
C
Chunping Qiu
Intelligent Game and Decision Lab (IGDL), Beijing, China
S
Shaowu Yang
College of Computer Science and Technology, National University of Defense Technology, Changsha, China