Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
🤖 AI Summary
This study addresses semi-parametric contextual pricing under unknown utility functions and additive noise distributions. Leveraging a scalar index model, the authors establish that the oracle price mapping possesses $(\beta\!-\!1)$-order smoothness and introduce an adaptive ellipsoidal exploration mechanism that requires no assumptions on the context distribution. Their modular coarse-to-fine strategy, ORBIT, integrates local polynomial regression with confidence-region optimization to achieve minimax-optimal nonparametric learning. Under linear utility specifications, the algorithm attains a cumulative regret bound of $\widetilde{O}(T^{(2\beta-1)/(4\beta-3)} + \sqrt{dT})$, and a matching lower bound is established for fixed dimension $d$.
📝 Abstract
We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=\mu_\ast(\mathsf c_t)+\xi_t$, with an unknown utility map $\mu_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=\mu_\ast(\mathsf c)$ and the noise tail. Under the $\beta$-H\"older smoothness of the tail function for $\beta\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(\beta-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $\mu_\ast(\mathsf c)=\mathsf c^\top\theta_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2\beta-1}{4\beta-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric H\"older utility.
Problem

Research questions and friction points this paper is trying to address.

contextual pricing
semiparametric model
oracle price map
unimodality
nonparametric learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

oracle price map
semiparametric pricing
unimodality
bandit convex optimization
minimax regret
Y
Yingying Fan
Data Sciences and Operations Department, University of Southern California, Los Angeles, California 90089, USA
Yuxuan Han
Yuxuan Han
Tsinghua University
computer visioncomputer graphics
Jinchi Lv
Jinchi Lv
Kenneth King Stonier Chair in Business Administration
AI for Business and ApplicationsStatistics and Data ScienceMachine Learning
X
Xiaocong Xu
Data Sciences and Operations Department, University of Southern California, Los Angeles, California 90089, USA
Zhengyuan Zhou
Zhengyuan Zhou
Dept of Technology, Operations and Statistics at NYU Stern
reinforcement learningoptimizationgame theoryoperations research