Learning Multi-Timescale Abstractions for Hierarchical Combinatorial Planning

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenges of large action spaces, high stochasticity, long decision horizons, and resource constraints in sequential stochastic combinatorial optimization by proposing a model-based hierarchical reinforcement learning framework. The approach integrates a world model formulated as a semi-Markov decision process (SMDP) with a latent-space tree-search planner, constructing temporal structures of abstract actions through multi-timescale objectives and jointly learning budget-aware policies conditioned on subgoals to enable context-sensitive resource allocation. A key innovation lies in incorporating adaptive temporal abstraction into hierarchical planning, allowing the latent dynamics to explicitly capture the effective duration of abstract actions. Experimental results demonstrate that the proposed method significantly outperforms strong existing baselines across multiple challenging benchmarks.

📝 Abstract

The combination of exponentially large action spaces, stochastic dynamics, and long-horizon decision-making under limited resources makes Sequential Stochastic Combinatorial Optimization (SSCO) particularly challenging for reinforcement learning. Hierarchical Reinforcement Learning (HRL) offers a natural decomposition, but it places the high-level policy in a Semi-Markov Decision Process (SMDP) where actions have variable durations, making it difficult to learn a world model that is suitable for planning. We introduce a model-based hierarchical framework for sequential stochastic combinatorial decision-making that directly addresses this issue. Our method combines a latent-space tree-search planner with an SMDP-aware world model for variable-duration decisions. A multi-timescale objective structures the latent dynamics so that transition magnitudes reflect the effective temporal scales of abstract actions, enabling efficient lookahead under adaptive temporal abstraction. We further learn a subgoal-conditioned budget policy jointly with the world model to support context-aware resource allocation. Across challenging SSCO benchmarks, our method outperforms strong baselines.

Problem

Research questions and friction points this paper is trying to address.

Sequential Stochastic Combinatorial Optimization

Hierarchical Reinforcement Learning

Semi-Markov Decision Process

World Model

Multi-Timescale Abstraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Reinforcement Learning

Multi-Timescale Abstraction

SMDP-Aware World Model