Offline Behavioral Data Selection

πŸ“… 2025-12-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Offline behavioral cloning suffers from inefficient training when scaling to large datasets, as empirical analysis reveals rapid policy performance saturation on small subsets and weak correlation between test loss and actual policy performance. Method: We propose Stepwise Dual Ranking (SDR), a novel data selection paradigm that integrates stage-wise prioritized truncation with dual ranking criteriaβ€”Q-value estimation (for action optimality) and state-density estimation (for state coverage). Contribution/Results: SDR is the first method to systematically identify and exploit performance saturation in offline RL datasets. On the D4RL benchmark, it achieves comparable or superior policy performance using only 10–20% of the full dataset, significantly improving data efficiency and training speed. The framework is interpretable, principled, and scalable, offering a general-purpose data refinement approach for efficient offline reinforcement learning.

Technology Category

Application Category

πŸ“ Abstract
Behavioral cloning is a widely adopted approach for offline policy learning from expert demonstrations. However, the large scale of offline behavioral datasets often results in computationally intensive training when used in downstream tasks. In this paper, we uncover the striking data saturation in offline behavioral data: policy performance rapidly saturates when trained on a small fraction of the dataset. We attribute this effect to the weak alignment between policy performance and test loss, revealing substantial room for improvement through data selection. To this end, we propose a simple yet effective method, Stepwise Dual Ranking (SDR), which extracts a compact yet informative subset from large-scale offline behavioral datasets. SDR is build on two key principles: (1) stepwise clip, which prioritizes early-stage data; and (2) dual ranking, which selects samples with both high action-value rank and low state-density rank. Extensive experiments and ablation studies on D4RL benchmarks demonstrate that SDR significantly enhances data selection for offline behavioral data.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational training intensity from large offline datasets
Addresses weak alignment between policy performance and test loss
Selects compact informative subsets via stepwise and dual ranking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepwise Dual Ranking for data selection
Prioritizes early-stage data with stepwise clip
Selects samples via action-value and state-density ranking
πŸ”Ž Similar Papers
No similar papers found.