Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline reinforcement learning suffers from performance degradation in sparse-reward and long-horizon tasks due to out-of-distribution (OOD) samples, a challenge inadequately addressed by existing offline model-based RL (MBRL) approaches. To tackle this, we propose a latent-space temporal augmentation framework that learns a temporally structured latent space and jointly models trajectory-level and transition-level temporal distances—enabling long-horizon behavioral abstraction and controllable transition augmentation. By integrating latent-space dynamics modeling with distance-aware representation learning, our method enhances the plausibility and generalizability of synthetic data. Evaluated on D4RL benchmarks—including AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen—our approach significantly outperforms prior offline MBRL baselines and achieves performance on par with or superior to diffusion-based trajectory augmentation and goal-conditioned RL methods.

Technology Category

Application Category

📝 Abstract
The goal of offline reinforcement learning (RL) is to extract a high-performance policy from the fixed datasets, minimizing performance degradation due to out-of-distribution (OOD) samples. Offline model-based RL (MBRL) is a promising approach that ameliorates OOD issues by enriching state-action transitions with augmentations synthesized via a learned dynamics model. Unfortunately, seminal offline MBRL methods often struggle in sparse-reward, long-horizon tasks. In this work, we introduce a novel MBRL framework, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates augmented transitions in a temporally structured latent space rather than in raw state space. To model long-horizon behavior, TempDATA learns a latent abstraction that captures a temporal distance from both trajectory and transition levels of state space. Our experiments confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in offline RL due to OOD samples
Improves sparse-reward long-horizon tasks in model-based RL
Generates temporally structured transitions in latent space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal distance-aware transition augmentation in latent space
Latent abstraction for long-horizon behavior modeling
Outperforms offline MBRL and diffusion-based methods
🔎 Similar Papers
No similar papers found.