Online Learning with Probing for Sequential User-Centric Selection

📅 2025-07-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
We address sequential decision-making problems—such as ride-hailing dispatch, wireless communication scheduling, and content recommendation—where both resource availability and rewards are unknown a priori and probing incurs high cost. To this end, we propose the Probe-Enhanced User-Centric Selection (PUCS) framework, enabling a two-phase “probe-then-assign” policy. First, we formulate PUCS as a unified optimization model and design an offline greedy probing algorithm achieving a constant approximation ratio ζ = (e−1)/(2e−1). Second, we develop the online learning algorithm OLPA, attaining a tight regret bound of O(√T + ln²T), with theoretical proof that its logarithmic factor is optimal. Leveraging combinatorial stochastic bandit learning, adaptive information probing strategies, and rigorous probabilistic analysis, OLPA significantly outperforms state-of-the-art baselines on real-world datasets, empirically validating both the efficacy of proactive probing and the user-centric assignment paradigm.

Technology Category

Application Category

📝 Abstract
We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns $K$ plays to $M$ arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee $ζ= (e-1)/(2e-1)$. For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound $mathcal{O}(sqrt{T} + ln^{2} T)$. We also prove a lower bound $Ω(sqrt{T})$, showing that the upper bound is tight up to logarithmic factors. Experiments on real-world data demonstrate the effectiveness of our solutions.
Problem

Research questions and friction points this paper is trying to address.

Sequential decision-making with costly information acquisition
Optimizing probing and selection in unknown resource-payoff scenarios
Balancing exploration-exploitation in online combinatorial bandit settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probing-augmented user-centric selection framework
Greedy probing algorithm with approximation guarantee
OLPA algorithm with tight regret bound
🔎 Similar Papers
No similar papers found.