Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

211K/year
πŸ€– AI Summary
This work addresses the data-driven learning of Lagrange multipliers in mixed-integer linear programming. By formulating multiplier learning as a statistical learning problem, it establishes the first minimax bounds on generalization error: an upper bound of $O(s^{1.5}/\sqrt{N})$ and a matching lower bound of $\Omega(s/\sqrt{N})$. The authors propose an averaged stochastic gradient ascent algorithm that achieves the optimal convergence rate of $\Theta(s/\sqrt{N})$. Furthermore, by incorporating a warm-start mechanism, they improve the convergence rate to $\Theta(s/N)$ under this setting. This study provides the first theoretical guarantees and a provably optimal learning framework for data-driven Lagrangian relaxation.
πŸ“ Abstract
Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the coupling constraints, LR enables parallel subproblem solving and often yields tighter dual bounds than standard linear programming relaxations, which is crucial for efficient branch-and-bound pruning. While recent empirical work has shown promising results using machine learning to predict these multipliers, a theoretical understanding of such methods remains an open question. In this work, we bridge this gap by analyzing the problem of learning LR through the lens of Data-driven Algorithm Design, i.e., a statistical learning problem over a distribution of problem instances. Our contributions are as follows: first, we derive a generalization bound of $\mathcal{O}(s^{1.5}/\sqrt{N})$ for the learned multipliers, where $s$ is the number of coupling constraints and $N$ is the sample size. Second, we provide a minimax lower-bound of $Ω(s/\sqrt{N})$, proving that a linear dependency is unavoidable. Third, we constructively close this theoretical gap by proving that Stochastic Gradient Ascent (SGA) with averaging achieves the minimax optimal rate $Θ(s/\sqrt{N})$. Finally, we extend our framework to the learning-to-warm-start setting, proving that it achieves a fast, minimax-optimal rate of $Θ(s/N)$ and establishing a theoretical advantage over direct multiplier prediction.
Problem

Research questions and friction points this paper is trying to address.

Lagrangian Relaxation
Mixed Integer Linear Programming
Data-driven Algorithm Design
Generalization Bound
Minimax Optimality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lagrangian Relaxation
Data-driven Algorithm Design
Generalization Bound
Minimax Optimality
Learning to Warm Start