A Single-Sample Polylogarithmic Regret Bound for Nonstationary Online Linear Programming

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses non-stationary online linear programming under resource constraints, where orders arrive sequentially as independent but non-identically distributed samples, and the decision-maker must make real-time accept/reject decisions using only a single historical sample. To tackle this highly non-stationary setting, the authors propose a novel re-solving algorithm that integrates dynamic programming principles with a dual-based framework, effectively adapting to distributional shifts in the large-resource regime. Theoretical analysis shows that when total resources scale proportionally with the number of orders, the algorithm achieves an $O((\log n)^2)$ regret bound over the entire sequence. This result establishes the first polylogarithmic performance guarantee under the challenging conditions of single-sample access and strong non-stationarity, thereby bridging a critical gap between existing theory and real-world dynamic environments.

Technology Category

Application Category

📝 Abstract
We study nonstationary Online Linear Programming (OLP), where $n$ orders arrive sequentially with reward-resource consumption pairs that form a sequence of independent, but not necessarily identically distributed, random vectors. At the beginning of the planning horizon, the decision-maker is provided with a resource endowment that is sufficient to fulfill a significant portion of the requests. The decision-maker seeks to maximize the expected total reward by making immediate and irrevocable acceptance or rejection decisions for each order, subject to this resource endowment. We focus on the challenging single-sample setting, where only one sample from each of the $n$ distributions is available at the start of the planning horizon. We propose a novel re-solving algorithm that integrates a dynamic programming perspective with the dual-based frameworks traditionally employed in stationary environments. In the large-resource regime, where the resource endowment scales linearly with the number of orders, we prove that our algorithm achieves $O((\log n)^2)$ regret across a broad class of nonstationary distribution sequences. Our results demonstrate that polylogarithmic regret is attainable even under significant environmental shifts and minimal data availability, bridging the gap between stationary OLP and more volatile real-world resource allocation problems.
Problem

Research questions and friction points this paper is trying to address.

Nonstationary Online Linear Programming
Single-Sample
Resource Allocation
Regret Minimization
Sequential Decision Making
Innovation

Methods, ideas, or system contributions that make the work stand out.

nonstationary online linear programming
single-sample setting
polylogarithmic regret
re-solving algorithm
dynamic programming
🔎 Similar Papers
No similar papers found.
H
Haoran Xu
Department of Management Science and Engineering, Stanford University
O
Owen Shen
Operations Research Center, Massachusetts Institute of Technology
P
Peter Glynn
Department of Management Science and Engineering, Stanford University
Yinyu Ye
Yinyu Ye
Professor of Emeritus, Stanford University and Visiting Professor of SJTU, CUHKSZ and HKUST
Optimization - Operations Research - Mathematical Programming - Computational Science
Patrick Jaillet
Patrick Jaillet
Dugald C. Jackson Professor, EECS, MIT
algorithmsonline optimizationlearningoperations research