🤖 AI Summary
This paper studies decentralized online price competition among multiple sellers under generalized linear (single-index) demand: each seller observes only its own demand and competitors’ prices, receives binary or real-valued demand feedback, and operates without coordinated exploration. We propose PML-GLUCB, a novel algorithm integrating penalized maximum likelihood estimation with a generalized upper-confidence-bound pricing rule—achieving the first fully distributed learning guarantee for nonlinear single-index demand models. To accommodate the multi-agent competitive structure, we refine the elliptical potential lemma. Theoretically, PML-GLUCB attains an $O(N^2 sqrt{T} log T)$ regret bound against a dynamic benchmark, matching the optimal rate up to logarithmic factors in the linear-demand setting. This significantly extends the applicability and practicality of existing approaches to broader, more realistic demand specifications.
📝 Abstract
We study sequential price competition among $N$ sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller $i$ follows the single index model $lambda_i(mathbf{p}) = mu_i(langle oldsymbol{ heta}_{i,0}, mathbf{p}
angle)$, with known increasing link $mu_i$ and unknown parameter $oldsymbol{ heta}_{i,0}$, where the vector $mathbf{p}$ denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand -- unobservable to competitors -- and the prices set by rivals. Our framework generalizes existing approaches that focus solely on linear demand models. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule, removing the need for coordinated exploration phases across sellers -- which is integral to previous linear models -- and accommodating both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves $O(N^{2}sqrt{T}log(T))$ regret, which essentially matches the optimal rate known in the linear setting. A significant technical contribution of our work is the development of a variant of the elliptical potential lemma -- typically applied in single-agent systems -- adapted to our competitive multi-agent environment.