Near-Optimal Regret-Queue Length Tradeoff in Online Learning for Two-Sided Markets

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the joint optimization of pricing and matching in bilateral markets with unknown supply and demand, aiming to maximize platform profit while controlling queue length. To address heterogeneous price sensitivities among users and service providers, we propose a novel online learning policy comprising two integrated components: dynamic optimization and probabilistic sampling. This design enables real-time estimation of unknown demand/supply curves while jointly balancing learning exploration and system stability. We theoretically establish that, for γ ∈ (0, 1/6], the policy achieves a near-optimal trade-off between regret and average queue length—namely, ( ilde{O}(T^{1-gamma})) regret and ( ilde{O}(T^{gamma/2})) average queue length. Our analysis is the first to explicitly reveal and quantify the fundamental tension between learning exploration and queue stability. Moreover, the resulting regret–queue-length trade-off strictly dominates those attained by existing methods.

Technology Category

Application Category

📝 Abstract
We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $ ilde{O}(T^{1-γ})$ regret, $ ilde{O}(T^{γ/2})$ average queue length, and $ ilde{O}(T^γ)$ maximum queue length for $γin (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $γ$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.
Problem

Research questions and friction points this paper is trying to address.

Design pricing and matching algorithms for two-sided markets
Maximize platform profit while maintaining reasonable queue lengths
Develop online learning policy with unknown demand-supply curves
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online learning policy for dynamic pricing optimization
Tradeoff management between regret and queue lengths
Probabilistic sampling to balance learning and queues
🔎 Similar Papers
No similar papers found.
Zixian Yang
Zixian Yang
EECS, University of Michigan, Ann Arbor
learning and queueingbanditsreinforcement learningcommunications
S
Sushil Mahavir Varma
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor
L
Lei Ying
Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor