Goal-Conditioned Data Augmentation for Offline Reinforcement Learning

📅 2024-12-29

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

In offline reinforcement learning, suboptimal datasets often lack high-return trajectories, limiting policy learning performance. To address this, we propose GODA, a Goal-Oriented Diffusion-based data augmentation framework. Our key contributions are: (1) a novel return-guided conditional mechanism that explicitly uses target return as the generation objective; (2) an adaptive gated conditional encoder that jointly optimizes noise input and target-conditioned guidance; and (3) a controllable scaling sampling strategy to enhance precision in generating high-return transitions. Evaluated on the D4RL benchmark and a real-world traffic signal control task, GODA consistently improves the performance of multiple offline RL algorithms—including BCQ and CQL—outperforming existing state-of-the-art data augmentation methods across all settings.

Technology Category

Application Category

📝 Abstract

Offline reinforcement learning (RL) enables policy learning from pre-collected offline datasets, relaxing the need to interact directly with the environment. However, limited by the quality of offline datasets, it generally fails to learn well-qualified policies in suboptimal datasets. To address datasets with insufficient optimal demonstrations, we introduce Goal-cOnditioned Data Augmentation (GODA), a novel goal-conditioned diffusion-based method for augmenting samples with higher quality. Leveraging recent advancements in generative modeling, GODA incorporates a novel return-oriented goal condition with various selection mechanisms. Specifically, we introduce a controllable scaling technique to provide enhanced return-based guidance during data sampling. GODA learns a comprehensive distribution representation of the original offline datasets while generating new data with selectively higher-return goals, thereby maximizing the utility of limited optimal demonstrations. Furthermore, we propose a novel adaptive gated conditioning method for processing noised inputs and conditions, enhancing the capture of goal-oriented guidance. We conduct experiments on the D4RL benchmark and real-world challenges, specifically traffic signal control (TSC) tasks, to demonstrate GODA's effectiveness in enhancing data quality and superior performance compared to state-of-the-art data augmentation methods across various offline RL algorithms.

Problem

Research questions and friction points this paper is trying to address.

Offline Reinforcement Learning

Data Efficiency

Policy Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

GODA

Data Augmentation

Offline Reinforcement Learning

🔎 Similar Papers

How to Solve Contextual Goal-Oriented Problems with Offline Datasets?