🤖 AI Summary
This study addresses non-stationary online linear programming under resource constraints, where orders arrive sequentially as independent but non-identically distributed samples, and the decision-maker must make real-time accept/reject decisions using only a single historical sample. To tackle this highly non-stationary setting, the authors propose a novel re-solving algorithm that integrates dynamic programming principles with a dual-based framework, effectively adapting to distributional shifts in the large-resource regime. Theoretical analysis shows that when total resources scale proportionally with the number of orders, the algorithm achieves an $O((\log n)^2)$ regret bound over the entire sequence. This result establishes the first polylogarithmic performance guarantee under the challenging conditions of single-sample access and strong non-stationarity, thereby bridging a critical gap between existing theory and real-world dynamic environments.
📝 Abstract
We study nonstationary Online Linear Programming (OLP), where $n$ orders arrive sequentially with reward-resource consumption pairs that form a sequence of independent, but not necessarily identically distributed, random vectors. At the beginning of the planning horizon, the decision-maker is provided with a resource endowment that is sufficient to fulfill a significant portion of the requests. The decision-maker seeks to maximize the expected total reward by making immediate and irrevocable acceptance or rejection decisions for each order, subject to this resource endowment. We focus on the challenging single-sample setting, where only one sample from each of the $n$ distributions is available at the start of the planning horizon.
We propose a novel re-solving algorithm that integrates a dynamic programming perspective with the dual-based frameworks traditionally employed in stationary environments. In the large-resource regime, where the resource endowment scales linearly with the number of orders, we prove that our algorithm achieves $O((\log n)^2)$ regret across a broad class of nonstationary distribution sequences. Our results demonstrate that polylogarithmic regret is attainable even under significant environmental shifts and minimal data availability, bridging the gap between stationary OLP and more volatile real-world resource allocation problems.