π€ AI Summary
This work addresses the challenges of data scarcity and suboptimal quality in offline reinforcement learning, which arise from reliance on a limited number of imperfect trajectories. To mitigate these issues, the paper proposes a trajectory-level data augmentation method that leverages the geometric relationships among the reward function, value function, and behavior policy. By incorporating the intrinsic geometry of the task, the approach remains compatible with suboptimal data-collecting policies and provides theoretical justification for trajectory augmentation. This is the first study to integrate trajectory-level augmentation with task-specific geometric structure, demonstrating significant improvements in both performance and data efficiency across a range of high-dimensional and partially observable navigation tasks.
π Abstract
We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.