🤖 AI Summary
This paper studies the joint optimization of pricing and matching in bilateral markets with unknown supply and demand, aiming to maximize platform profit while controlling queue length. To address heterogeneous price sensitivities among users and service providers, we propose a novel online learning policy comprising two integrated components: dynamic optimization and probabilistic sampling. This design enables real-time estimation of unknown demand/supply curves while jointly balancing learning exploration and system stability. We theoretically establish that, for γ ∈ (0, 1/6], the policy achieves a near-optimal trade-off between regret and average queue length—namely, ( ilde{O}(T^{1-gamma})) regret and ( ilde{O}(T^{gamma/2})) average queue length. Our analysis is the first to explicitly reveal and quantify the fundamental tension between learning exploration and queue stability. Moreover, the resulting regret–queue-length trade-off strictly dominates those attained by existing methods.
📝 Abstract
We study a two-sided market, wherein, price-sensitive heterogeneous customers and servers arrive and join their respective queues. A compatible customer-server pair can then be matched by the platform, at which point, they leave the system. Our objective is to design pricing and matching algorithms that maximize the platform's profit, while maintaining reasonable queue lengths. As the demand and supply curves governing the price-dependent arrival rates may not be known in practice, we design a novel online-learning-based pricing policy and establish its near-optimality. In particular, we prove a tradeoff among three performance metrics: $ ilde{O}(T^{1-γ})$ regret, $ ilde{O}(T^{γ/2})$ average queue length, and $ ilde{O}(T^γ)$ maximum queue length for $γin (0, 1/6]$, significantly improving over existing results [1]. Moreover, barring the permissible range of $γ$, we show that this trade-off between regret and average queue length is optimal up to logarithmic factors under a class of policies, matching the optimal one as in [2] which assumes the demand and supply curves to be known. Our proposed policy has two noteworthy features: a dynamic component that optimizes the tradeoff between low regret and small queue lengths; and a probabilistic component that resolves the tension between obtaining useful samples for fast learning and maintaining small queue lengths.