🤖 AI Summary
Online linear programming (OLP) faces a fundamental trade-off between computational efficiency and theoretical performance—frequent LP re-solving ensures accuracy but incurs high overhead, while infrequent updates risk suboptimal decisions.
Method: We propose a sparse resolution framework that solves the primal LP exactly only at $O(log log T)$ critical time points and performs lightweight first-order updates elsewhere. The approach integrates stochastic programming modeling, piecewise adaptive decision rules, and online learning under unknown finite-support distributions.
Contribution/Results: This is the first OLP algorithm achieving a constant regret bound under unknown finite-support distributions, while reducing LP solves to $O(log log T)$. We further introduce an adjustable-frequency infrequent-resolving framework: with $M$ total LP solves, it attains asymptotically optimal regret $Oig(T^{(1/2)^{M-1}}ig)$. Experiments demonstrate consistent superiority over state-of-the-art LP-based and LP-free baselines across diverse problem instances.
📝 Abstract
Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management, order fulfillment and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requires solving a large number of LPs, which could be computationally expensive. In contrast, LP-free algorithm only requires first-order computations but induces a worse performance, lacking a constant regret bound. In this work, we bridge the gap between these two extremes by proposing a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points. Specifically, for the case where the inputs are drawn from an unknown finite-support distribution, the proposed algorithm achieves a constant regret (even for the hard ``degenerate'' case) while solving LPs only $mathcal{O}(loglog T)$ times over the time horizon $T$. Moreover, when we are allowed to solve LPs only $M$ times, we design the corresponding schedule such that the proposed algorithm can guarantee a nearly $mathcal{O}left(T^{(1/2)^{M-1}}
ight)$ regret. Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee of the proposed policy under different infrequent resolving schedules. Furthermore, when the arrival probabilities are known at the beginning, our algorithm can guarantee a constant regret by solving LPs $mathcal{O}(loglog T)$ times, and a nearly $mathcal{O}left(T^{(1/2)^{M}}
ight)$ regret by solving LPs only $M$ times. Numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.