State-Action Inpainting Diffuser for Continuous Control with Delay

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the temporal misalignment between perception and action in continuous control caused by signal delays. To tackle this challenge, the authors propose a sequence inpainting approach based on generative diffusion models, which formulates delayed control as a joint state-action sequence restoration task. By implicitly learning environment dynamics, the method directly generates temporally consistent policies, thereby unifying model-based and model-free reinforcement learning paradigms for the first time. The proposed framework combines the inductive bias of dynamics modeling with the end-to-end training advantages of policy optimization, making it suitable for both online and offline settings. Empirical evaluations demonstrate state-of-the-art performance across multiple delayed continuous control benchmarks, with significant improvements in robustness and generalization.

Technology Category

Application Category

📝 Abstract

Signal delay poses a fundamental challenge in continuous control and reinforcement learning (RL) by introducing a temporal gap between interaction and perception. Current solutions have largely evolved along two distinct paradigms: model-free approaches which utilize state augmentation to preserve Markovian properties, and model-based methods which focus on inferring latent beliefs via dynamics modeling. In this paper, we bridge these perspectives by introducing State-Action Inpainting Diffuser (SAID), a framework that integrates the inductive bias of dynamics learning with the direct decision-making capability of policy optimization. By formulating the problem as a joint sequence inpainting task, SAID implicitly captures environmental dynamics while directly generating consistent plans, effectively operating at the intersection of model-based and model-free paradigms. Crucially, this generative formulation allows SAID to be seamlessly applied to both online and offline RL. Extensive experiments on delayed continuous control benchmarks demonstrate that SAID achieves state-of-the-art and robust performance. Our study suggests a new methodology to advance the field of RL with delay.

Problem

Research questions and friction points this paper is trying to address.

signal delay

continuous control

reinforcement learning

Markovian property

temporal gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

State-Action Inpainting

Diffusion Model

Continuous Control with Delay