A Single-Sample Polylogarithmic Regret Bound for Nonstationary Online Linear Programming

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study addresses non-stationary online linear programming under resource constraints, where orders arrive sequentially as independent but non-identically distributed samples, and the decision-maker must make real-time accept/reject decisions using only a single historical sample. To tackle this highly non-stationary setting, the authors propose a novel re-solving algorithm that integrates dynamic programming principles with a dual-based framework, effectively adapting to distributional shifts in the large-resource regime. Theoretical analysis shows that when total resources scale proportionally with the number of orders, the algorithm achieves an $O((\log n)^2)$ regret bound over the entire sequence. This result establishes the first polylogarithmic performance guarantee under the challenging conditions of single-sample access and strong non-stationarity, thereby bridging a critical gap between existing theory and real-world dynamic environments.

Technology Category

Application Category

📝 Abstract

We study nonstationary Online Linear Programming (OLP), where $n$ orders arrive sequentially with reward-resource consumption pairs that form a sequence of independent, but not necessarily identically distributed, random vectors. At the beginning of the planning horizon, the decision-maker is provided with a resource endowment that is sufficient to fulfill a significant portion of the requests. The decision-maker seeks to maximize the expected total reward by making immediate and irrevocable acceptance or rejection decisions for each order, subject to this resource endowment. We focus on the challenging single-sample setting, where only one sample from each of the $n$ distributions is available at the start of the planning horizon. We propose a novel re-solving algorithm that integrates a dynamic programming perspective with the dual-based frameworks traditionally employed in stationary environments. In the large-resource regime, where the resource endowment scales linearly with the number of orders, we prove that our algorithm achieves $O((\log n)^2)$ regret across a broad class of nonstationary distribution sequences. Our results demonstrate that polylogarithmic regret is attainable even under significant environmental shifts and minimal data availability, bridging the gap between stationary OLP and more volatile real-world resource allocation problems.

Problem

Research questions and friction points this paper is trying to address.

Nonstationary Online Linear Programming

Single-Sample

Resource Allocation

Regret Minimization

Sequential Decision Making

Innovation

Methods, ideas, or system contributions that make the work stand out.

nonstationary online linear programming

single-sample setting

polylogarithmic regret