Realizable Abstractions: Near-Optimal Hierarchical Reinforcement Learning

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing MDP abstractions in hierarchical reinforcement learning (HRL) suffer from weak expressive power and lack efficiency guarantees, rendering them inadequate for large-scale MDPs. Method: We propose *Realizable Abstraction*, a novel framework that formally defines abstractions with provable near-optimality guarantees—ensuring reliable, Markovian mapping between high-level decisions and the underlying environment, while enabling verifiably effective translation of abstract policies into near-optimal low-level policies. Our approach constructs options via constrained MDPs and integrates option composition with HRL to achieve efficient, verifiable policy synthesis. Contribution/Results: We introduce the Realizable Abstraction Reinforcement Learning (RARL) algorithm, which enjoys PAC convergence, polynomial sample complexity, and robustness to abstraction error. Empirically, RARL achieves near-optimal performance across multiple evaluation metrics, establishing a theoretically grounded and practically effective foundation for scalable HRL.

Technology Category

Application Category

📝 Abstract

The main focus of Hierarchical Reinforcement Learning (HRL) is studying how large Markov Decision Processes (MDPs) can be more efficiently solved when addressed in a modular way, by combining partial solutions computed for smaller subtasks. Despite their very intuitive role for learning, most notions of MDP abstractions proposed in the HRL literature have limited expressive power or do not possess formal efficiency guarantees. This work addresses these fundamental issues by defining Realizable Abstractions, a new relation between generic low-level MDPs and their associated high-level decision processes. The notion we propose avoids non-Markovianity issues and has desirable near-optimality guarantees. Indeed, we show that any abstract policy for Realizable Abstractions can be translated into near-optimal policies for the low-level MDP, through a suitable composition of options. As demonstrated in the paper, these options can be expressed as solutions of specific constrained MDPs. Based on these findings, we propose RARL, a new HRL algorithm that returns compositional and near-optimal low-level policies, taking advantage of the Realizable Abstraction given in the input. We show that RARL is Probably Approximately Correct, it converges in a polynomial number of samples, and it is robust to inaccuracies in the abstraction.

Problem

Research questions and friction points this paper is trying to address.

Defines Realizable Abstractions for hierarchical reinforcement learning

Ensures near-optimal policies via compositional options in MDPs

Proposes RARL algorithm with PAC guarantees and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Realizable Abstractions linking MDPs with formal guarantees

Compositional near-optimal policies via constrained MDP options

RARL algorithm ensures polynomial sample convergence and robustness

🔎 Similar Papers

Offline Hierarchical Reinforcement Learning via Inverse Optimization