Black-Box Combinatorial Optimization with Order-Invariant Reinforcement Learning

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Traditional Estimation-of-Distribution Algorithms (EDAs) for black-box combinatorial optimization rely on explicit variable dependency graphs, making them ill-suited for modeling complex, high-order interactions among variables. Method: This paper proposes a permutation-invariant reinforcement learning framework that trains an autoregressive generative model on randomly permuted variable sequences; sequence randomization acts as an information-preserving dropout to enforce permutation invariance. It integrates Generalized Reinforcement Policy Optimization (GRPO) with a scale-invariant advantage function, eliminating assumptions about fixed variable ordering. Contribution/Results: By bypassing explicit dependency graph learning, the method significantly reduces computational overhead and improves sample efficiency and search robustness. It achieves state-of-the-art performance across diverse benchmark algorithms and problem scales, effectively mitigating catastrophic failure.

Technology Category

Application Category

📝 Abstract

We introduce an order-invariant reinforcement learning framework for black-box combinatorial optimization. Classical estimation-of-distribution algorithms (EDAs) often rely on learning explicit variable dependency graphs, which can be costly and fail to capture complex interactions efficiently. In contrast, we parameterize a multivariate autoregressive generative model trained without a fixed variable ordering. By sampling random generation orders during training - a form of information-preserving dropout - the model is encouraged to be invariant to variable order, promoting search-space diversity and shaping the model to focus on the most relevant variable dependencies, improving sample efficiency. We adapt Generalized Reinforcement Policy Optimization (GRPO) to this setting, providing stable policy-gradient updates from scale-invariant advantages. Across a wide range of benchmark algorithms and problem instances of varying sizes, our method frequently achieves the best performance and consistently avoids catastrophic failures.

Problem

Research questions and friction points this paper is trying to address.

Solving black-box combinatorial optimization problems efficiently

Learning complex variable dependencies without fixed ordering

Improving sample efficiency through order-invariant training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Order-invariant reinforcement learning for black-box optimization

Autoregressive model trained with random generation orders

Adapted GRPO for stable policy-gradient updates

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations