Sequential Stochastic Combinatorial Optimization Using Hierarchal Reinforcement Learning

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the Sequential Stochastic Combinatorial Optimization (SSCO) problem, with applications to adaptive influence maximization and epidemic intervention, moving beyond the conventional assumption of uniform budget allocation. We propose Wake-Sleep Option, a two-level hierarchical reinforcement learning framework based on options. It is the first to jointly model the coupled bilevel Markov Decision Process (MDP), where budget allocation operates at the upper level and node selection at the lower level. By integrating option-based temporal abstraction with policy gradient optimization and stability regularization, our method effectively mitigates cyclic interference between levels. Extensive experiments on multiple real-world networks demonstrate significant improvements over state-of-the-art baselines. Moreover, the model exhibits strong generalization capability: it supports zero-shot transfer to substantially larger graphs, drastically reducing retraining overhead.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has emerged as a promising tool for combinatorial optimization (CO) problems due to its ability to learn fast, effective, and generalizable solutions. Nonetheless, existing works mostly focus on one-shot deterministic CO, while sequential stochastic CO (SSCO) has rarely been studied despite its broad applications such as adaptive influence maximization (IM) and infectious disease intervention. In this paper, we study the SSCO problem where we first decide the budget (e.g., number of seed nodes in adaptive IM) allocation for all time steps, and then select a set of nodes for each time step. The few existing studies on SSCO simplify the problems by assuming a uniformly distributed budget allocation over the time horizon, yielding suboptimal solutions. We propose a generic hierarchical RL (HRL) framework called wake-sleep option (WS-option), a two-layer option-based framework that simultaneously decides adaptive budget allocation on the higher layer and node selection on the lower layer. WS-option starts with a coherent formulation of the two-layer Markov decision processes (MDPs), capturing the interdependencies between the two layers of decisions. Building on this, WS-option employs several innovative designs to balance the model's training stability and computational efficiency, preventing the vicious cyclic interference issue between the two layers. Empirical results show that WS-option exhibits significantly improved effectiveness and generalizability compared to traditional methods. Moreover, the learned model can be generalized to larger graphs, which significantly reduces the overhead of computational resources.
Problem

Research questions and friction points this paper is trying to address.

Sequential stochastic combinatorial optimization
Adaptive budget allocation
Hierarchical reinforcement learning framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Reinforcement Learning framework
Wake-sleep option design
Two-layer Markov decision processes
🔎 Similar Papers
No similar papers found.