Offline Hierarchical Reinforcement Learning via Inverse Optimization

๐Ÿ“… 2024-10-10
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Hierarchical reinforcement learning (HRL) from static offline data is hindered by unobservable high-level actions and heterogeneous policy architectures, leading to poor trainability. Method: We propose OHIO, the first framework to integrate inverse optimization into offline hierarchical RLโ€”unsupervisedly inferring latent high-level actions from observed trajectories to construct hierarchy-aligned training data. Grounded in hierarchical MDP modeling, OHIO is compatible with offline RL algorithms (e.g., BCQ, CQL) and enables cross-architecture data transfer. Results: Evaluated on robotic manipulation and network resource scheduling tasks, OHIO significantly outperforms end-to-end offline RL baselines, improving policy robustness by 37%. It supports offline pretraining followed by online fine-tuning, eliminating the conventional requirement for observable high-level actions in offline HRL.

Technology Category

Application Category

๐Ÿ“ Abstract
Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the inverse problem, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed.
Problem

Research questions and friction points this paper is trying to address.

Learning hierarchical policies from static offline datasets
Recovering unobservable high-level actions in hierarchical controllers
Improving robustness and performance in offline reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline hierarchical reinforcement learning framework
Inverse optimization recovers unobservable high-level actions
Improves robustness over end-to-end RL methods
๐Ÿ”Ž Similar Papers
No similar papers found.