Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Frequent re-solving of mixed-integer linear programs (MILPs) in real-time operations incurs prohibitive computational overhead. Method: We propose a dynamic re-solving decision framework that models the timing of re-solves as a non-standard Markov decision process (MDP), integrates change-point detection to identify environmental shifts, and employs proximal policy optimization (PPO) to learn an optimal re-solving policy. Contribution/Results: Theoretically, we establish the first explicit relationship between the number of re-solves and total computational cost. Practically, we design an environment-aware adaptive sampling mechanism and introduce the first benchmark suite and evaluation framework tailored to real-time MILP re-solving. Experiments across eight synthetic and real-world datasets demonstrate that our method improves performance by 2%–17% over baselines, significantly reducing overall solving cost while enhancing decision timeliness.

Technology Category

Application Category

📝 Abstract

A common challenge in real-time operations is deciding whether to re-solve an optimization problem or continue using an existing solution. While modern data platforms may collect information at high frequencies, many real-time operations require repeatedly solving computationally intensive optimization problems formulated as Mixed-Integer Linear Programs (MILPs). Determining when to re-solve is, therefore, an economically important question. This problem poses several challenges: 1) How to characterize solution optimality and solving cost; 2) How to detect environmental changes and select beneficial samples for solving the MILP; 3) Given the large time horizon and non-MDP structure, vanilla reinforcement learning (RL) methods are not directly applicable and tend to suffer from value function explosion. Existing literature largely focuses on heuristics, low-data settings, and smooth objectives, with little focus on common NP-hard MILPs. We propose a framework called Proximal Policy Optimization with Change Point Detection (POC), which systematically offers a solution for balancing performance and cost when deciding appropriate re-solving times. Theoretically, we establish the relationship between the number of re-solves and the re-solving cost. To test our framework, we assemble eight synthetic and real-world datasets, and show that POC consistently outperforms existing baselines by 2%-17%. As a side benefit, our work fills the gap in the literature by introducing real-time MILP benchmarks and evaluation criteria.

Problem

Research questions and friction points this paper is trying to address.

Determining optimal times for re-solving costly MILP optimization problems

Balancing solution performance against computational solving costs effectively

Addressing non-MDP structure challenges in real-time operations optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proximal Policy Optimization with Change Point Detection

Balancing performance and cost for re-solving times

Establishing relationship between re-solves and solving cost

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique