Model-Based Reinforcement Learning under Random Observation Delays

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world sensor observations often suffer from stochastic delays and out-of-order arrivals, whereas standard reinforcement learning assumes instantaneous observations; existing approaches inadequately model such delays within the partially observable Markov decision process (POMDP) framework. This paper formally models stochastic observation delay within POMDPs for the first time and introduces a sequential belief-state update mechanism that dynamically fuses delayed, out-of-order observations—overcoming the limitations of conventional history-stacking methods. The mechanism is robust to shifts in delay distribution and integrates seamlessly into model-based RL frameworks such as Dreamer. Experiments on simulated robotic control tasks demonstrate that our method significantly outperforms existing baselines and heuristic strategies across diverse delay patterns—including stochastic, bursty, and heavy-tailed delays—exhibiting strong effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract
Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations. We then introduce a simple delay-aware framework that incorporates this idea into model-based RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach to delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to delay distribution shifts during deployment. Additionally, we present experiments on simulated robotic tasks, comparing our method to common practical heuristics and emphasizing the importance of explicitly modeling observation delays.
Problem

Research questions and friction points this paper is trying to address.

Addressing random observation delays in reinforcement learning environments
Solving out-of-sequence sensor data issues in POMDP settings
Overcoming limitations of naive approaches for delayed observations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based filtering process for belief state updates
Delay-aware framework integrated with model-based RL
Robust performance under delay distribution shifts
🔎 Similar Papers
No similar papers found.