🤖 AI Summary
This study addresses dynamic pricing mechanisms based on binary feedback in online two-sided markets, with the objective of maximizing either gains from trade (GFT) or platform profit. For varying market structures—one-to-many and many-to-many—the work proposes three classes of mechanisms: single-price, double-price, and piecewise pricing—and demonstrates that the expressiveness of a mechanism critically influences its learnability. The paper innovatively designs a piecewise pricing mechanism that overcomes the linear regret limitation inherent in double-price mechanisms, achieving sublinear regret for the first time in many-to-many markets. Theoretical analysis establishes regret bounds of $O(n^2 \log\log T)$ under profit maximization; for GFT maximization, it achieves $O(\log\log T)$ in one-to-many markets and $O(n^2 \log\log T + n^3)$ in many-to-many markets, with further extensions to contextual settings.
📝 Abstract
We investigate online pricing in two-sided markets where a platform repeatedly posts prices based on binary accept/reject feedback to maximize gains-from-trade (GFT) or profit. We characterize the regret achievable across three mechanism classes: Single-Price, Two-Price, and Segmented-Price. For profit maximization, we design an algorithm using Two-Price Mechanisms that achieves $O(n^2 \log\log T)$ regret, where $n$ is the number of traders. For GFT maximization, the optimal regret depends critically on both market size and mechanism expressiveness. Constant regret is achievable in bilateral trade, but this guarantee breaks down as the market grows: even in a one-seller, two-buyer market, any algorithm using Single-Price Mechanisms suffers regret at least $\Omega\!\big(\frac{\log\log T}{\log\log\log\log T}\big)$, and we provide a nearly matching $O(\log\log T)$ upper bound for general one-to-many markets. In full many-to-many markets, we prove that Two-Price Mechanisms inevitably incur linear regret $\Omega(T)$ due to a \emph{mismatch phenomenon}, wherein inefficient pairings prevent near-optimal trade. To overcome this barrier, we introduce \emph{Segmented-Price Mechanisms}, which partition traders into groups and assign distinct prices per group. Using this richer mechanism, we design an algorithm achieving $O(n^2 \log\log T + n^3)$ regret for GFT maximization. Finally, we extend our results to the contextual setting, where traders'costs and values depend linearly on observed $d$-dimensional features that vary across rounds, obtaining regret bounds of $O(n^2 d \log\log T + n^2 d \log d)$ for profit and $O(n^2 d^2 \log T)$ for GFT. Our work delineates sharp boundaries between learnable and unlearnable regimes in two-sided dynamic pricing and demonstrates how modest increases in pricing expressiveness can circumvent fundamental hardness barriers.