Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Offline reinforcement learning suffers from performance degradation in sparse-reward and long-horizon tasks due to out-of-distribution (OOD) samples, a challenge inadequately addressed by existing offline model-based RL (MBRL) approaches. To tackle this, we propose a latent-space temporal augmentation framework that learns a temporally structured latent space and jointly models trajectory-level and transition-level temporal distances—enabling long-horizon behavioral abstraction and controllable transition augmentation. By integrating latent-space dynamics modeling with distance-aware representation learning, our method enhances the plausibility and generalizability of synthetic data. Evaluated on D4RL benchmarks—including AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen—our approach significantly outperforms prior offline MBRL baselines and achieves performance on par with or superior to diffusion-based trajectory augmentation and goal-conditioned RL methods.

Technology Category

Application Category

📝 Abstract

The goal of offline reinforcement learning (RL) is to extract a high-performance policy from the fixed datasets, minimizing performance degradation due to out-of-distribution (OOD) samples. Offline model-based RL (MBRL) is a promising approach that ameliorates OOD issues by enriching state-action transitions with augmentations synthesized via a learned dynamics model. Unfortunately, seminal offline MBRL methods often struggle in sparse-reward, long-horizon tasks. In this work, we introduce a novel MBRL framework, dubbed Temporal Distance-Aware Transition Augmentation (TempDATA), that generates augmented transitions in a temporally structured latent space rather than in raw state space. To model long-horizon behavior, TempDATA learns a latent abstraction that captures a temporal distance from both trajectory and transition levels of state space. Our experiments confirm that TempDATA outperforms previous offline MBRL methods and achieves matching or surpassing the performance of diffusion-based trajectory augmentation and goal-conditioned RL on the D4RL AntMaze, FrankaKitchen, CALVIN, and pixel-based FrankaKitchen.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in offline RL due to OOD samples

Improves sparse-reward long-horizon tasks in model-based RL

Generates temporally structured transitions in latent space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal distance-aware transition augmentation in latent space

Latent abstraction for long-horizon behavior modeling

Outperforms offline MBRL and diffusion-based methods

🔎 Similar Papers

Offline Trajectory Generalization for Offline Reinforcement Learning